Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757461AbXHAQwW (ORCPT ); Wed, 1 Aug 2007 12:52:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752528AbXHAQwK (ORCPT ); Wed, 1 Aug 2007 12:52:10 -0400 Received: from sca-es-mail-1.Sun.COM ([192.18.43.132]:56409 "EHLO sca-es-mail-1.sun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753392AbXHAQwI (ORCPT ); Wed, 1 Aug 2007 12:52:08 -0400 Date: Wed, 01 Aug 2007 09:57:14 -0700 From: Yinghai Lu Subject: Re: - pci-device-ensure-sysdata-initialised-v3.patch removed from -mm tree In-reply-to: <20070801155730.GA2024@shadowen.org> To: Andy Whitcroft Cc: akpm@linux-foundation.org, cornelia.huck@de.ibm.com, greg@kroah.com, jeff@garzik.org, linux-pci@atrey.karlin.mff.cuni.cz, linux-kernel@vger.kernel.org, Mel Gorman Reply-to: Yinghai.Lu@Sun.COM Message-id: <46B0BB6A.9080101@sun.com> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=us-ascii Content-transfer-encoding: 7BIT References: <200707270624.l6R6Oj5w009397@imap1.linux-foundation.org> <20070801155730.GA2024@shadowen.org> User-Agent: Thunderbird 2.0.0.5 (X11/20070716) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7541 Lines: 185 Andy Whitcroft wrote: > On Thu, Jul 26, 2007 at 11:24:45PM -0700, akpm@linux-foundation.org wrote: >> The patch titled >> pci device ensure sysdata initialised v3 >> has been removed from the -mm tree. Its filename was >> pci-device-ensure-sysdata-initialised-v3.patch >> >> This patch was dropped because Yinghai Lu's stuff hits on the same code > > I still get panics with 2.6.23-rc1-mm2 so I have rebased this patch > on top of that. I seem to have to trot this one out on a regular > basis and it seems to live its life in -mm. It is clear that there > are broken call sites here and something needs to be done to them to > avoid this. Ideas for alternative solutions or can we get something > like this merged? > > -apw > > === 8< === > pci device ensure sysdata initialised v4 > > We have been seeing panic's on NUMA systems in pci_call_probe() in > 2.6.19-rc1-mm1 and later. This is related to the changes introduced > in the commit below: > > [x86, PCI] Switch pci_bus::sysdata from NUMA node integer to a pointer > 0a247a58fc3e2ecfc17654301033e8b8d08df2a2 > > In this change the sysdata has changed from directly representing > a value (the node number in NUMA) to a pointer to a structure. > However, it seems that we do not always initialise this sysdata > before we probe the device. > > Prior to the changes above the node was defaulted to 'NULL' > allocating the devices to node 0 unconditionally. This patch adds > a default sysdata entry (pci_default_sysdata), this is then used > where 'NULL' was used previously. pci_default_sysdata defaults > the node to unknown (-1). This is a more accurate assignment, > mirroring the value returned where no topology support is provided > and no locality information is available. > > There are only two uses of this value in the affected architectures > (x86, x86_64) and generic code: > > 1) in x86_64, dma_alloc_pages() looks up the node in order to > allocate node local memory. Here if the node is invalid we > will default to the first online node. Behaviour here should > be unchanged. > 2) in generic, pci_call_probe() looks up the node in order to > restrict execution of the probe on the card local node, to > favor node local allocation. Where this is unknown previously > we would force execution (and thereby allocation) to node 0, > this is arguably wrong and using -1 releases this restriction. > > In an ideal world we should be supplying a sysdata for the > appropriate node where it is known. Where it is not known defaulting > to -1 seems a better course, and would help us where node 0 is > short of memory. > > Signed-off-by: Andy Whitcroft > --- > arch/i386/pci/common.c | 2 ++ > arch/i386/pci/fixup.c | 8 +++++--- > arch/i386/pci/numa.c | 8 +++++--- > arch/i386/pci/visws.c | 4 ++-- > include/asm-i386/pci.h | 1 + > include/asm-x86_64/pci.h | 1 + > 6 files changed, 16 insertions(+), 8 deletions(-) > diff --git a/arch/i386/pci/common.c b/arch/i386/pci/common.c > index d188e10..ed1e986 100644 > --- a/arch/i386/pci/common.c > +++ b/arch/i386/pci/common.c > @@ -27,6 +27,8 @@ unsigned long pirq_table_addr; > struct pci_bus *pci_root_bus; > struct pci_raw_ops *raw_pci_ops; > > +struct pci_sysdata pci_default_sysdata = { .node = -1 }; > + > static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value) > { > return raw_pci_ops->read(0, bus->number, devfn, where, size, value); > diff --git a/arch/i386/pci/fixup.c b/arch/i386/pci/fixup.c > index e7306db..80365ac 100644 > --- a/arch/i386/pci/fixup.c > +++ b/arch/i386/pci/fixup.c > @@ -25,9 +25,11 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d) > pci_read_config_byte(d, reg++, &subb); > DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb); > if (busno) > - pci_scan_bus(busno, &pci_root_ops, NULL); /* Bus A */ > + pci_scan_bus(busno, &pci_root_ops, > + &pci_default_sysdata); /* Bus A */ > if (suba < subb) > - pci_scan_bus(suba+1, &pci_root_ops, NULL); /* Bus B */ > + pci_scan_bus(suba+1, &pci_root_ops, > + &pci_default_sysdata); /* Bus B */ > } > pcibios_last_bus = -1; > } > @@ -42,7 +44,7 @@ static void __devinit pci_fixup_i450gx(struct pci_dev *d) > u8 busno; > pci_read_config_byte(d, 0x4a, &busno); > printk(KERN_INFO "PCI: i440KX/GX host bridge %s: secondary bus %02x\n", pci_name(d), busno); > - pci_scan_bus(busno, &pci_root_ops, NULL); > + pci_scan_bus(busno, &pci_root_ops, &pci_default_sysdata); > pcibios_last_bus = -1; > } > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82454GX, pci_fixup_i450gx); > diff --git a/arch/i386/pci/numa.c b/arch/i386/pci/numa.c > index adbe17a..8d2fd6c 100644 > --- a/arch/i386/pci/numa.c > +++ b/arch/i386/pci/numa.c > @@ -97,9 +97,11 @@ static void __devinit pci_fixup_i450nx(struct pci_dev *d) > pci_read_config_byte(d, reg++, &subb); > DBG("i450NX PXB %d: %02x/%02x/%02x\n", pxb, busno, suba, subb); > if (busno) > - pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, NULL); /* Bus A */ > + pci_scan_bus(QUADLOCAL2BUS(quad,busno), &pci_root_ops, > + &pci_default_sysdata); /* Bus A */ > if (suba < subb) > - pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, NULL); /* Bus B */ > + pci_scan_bus(QUADLOCAL2BUS(quad,suba+1), &pci_root_ops, > + &pci_default_sysdata); /* Bus B */ > } > pcibios_last_bus = -1; > } > @@ -124,7 +126,7 @@ static int __init pci_numa_init(void) > printk("Scanning PCI bus %d for quad %d\n", > QUADLOCAL2BUS(quad,0), quad); > pci_scan_bus(QUADLOCAL2BUS(quad,0), > - &pci_root_ops, NULL); > + &pci_root_ops, &pci_default_sysdata); > } > return 0; > } > diff --git a/arch/i386/pci/visws.c b/arch/i386/pci/visws.c > index f1b486d..2539371 100644 > --- a/arch/i386/pci/visws.c > +++ b/arch/i386/pci/visws.c > @@ -101,8 +101,8 @@ static int __init pcibios_init(void) > "bridge B (PIIX4) bus: %u\n", pci_bus1, pci_bus0); > > raw_pci_ops = &pci_direct_conf1; > - pci_scan_bus(pci_bus0, &pci_root_ops, NULL); > - pci_scan_bus(pci_bus1, &pci_root_ops, NULL); > + pci_scan_bus(pci_bus0, &pci_root_ops, &pci_default_sysdata); > + pci_scan_bus(pci_bus1, &pci_root_ops, &pci_default_sysdata); > pci_fixup_irqs(visws_swizzle, visws_map_irq); > pcibios_resource_survey(); > return 0; > diff --git a/include/asm-i386/pci.h b/include/asm-i386/pci.h > index d790343..7003604 100644 > --- a/include/asm-i386/pci.h > +++ b/include/asm-i386/pci.h > @@ -7,6 +7,7 @@ > struct pci_sysdata { > int node; /* NUMA node */ > }; > +extern struct pci_sysdata pci_default_sysdata; > > #include /* for struct page */ > > diff --git a/include/asm-x86_64/pci.h b/include/asm-x86_64/pci.h > index 88926eb..2c2b092 100644 > --- a/include/asm-x86_64/pci.h > +++ b/include/asm-x86_64/pci.h > @@ -9,6 +9,7 @@ struct pci_sysdata { > int node; /* NUMA node */ > void* iommu; /* IOMMU private data */ > }; > +extern struct pci_sysdata pci_default_sysdata; > > #ifdef CONFIG_CALGARY_IOMMU > static inline void* pci_iommu(struct pci_bus *bus) Acked-by: Yinghai Lu anyway i still think we should put .node and .iommu in pci_bus directly. YH - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/