Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932295AbXKHXyX (ORCPT ); Thu, 8 Nov 2007 18:54:23 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1763422AbXKHXyP (ORCPT ); Thu, 8 Nov 2007 18:54:15 -0500 Received: from idcmail-mo1so.shaw.ca ([24.71.223.10]:22630 "EHLO pd2mo3so.prod.shaw.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763420AbXKHXyO (ORCPT ); Thu, 8 Nov 2007 18:54:14 -0500 Date: Thu, 08 Nov 2007 17:51:37 -0600 From: Robert Hancock Subject: Re: How do I debug PCI resource allocation problems In-reply-to: To: Rainer Koenig Cc: linux-kernel@vger.kernel.org Message-id: <4733A109.8070405@shaw.ca> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit References: User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 23062 Lines: 483 Rainer Koenig wrote: > This will get long, sorry. But I'm a bit desperate because I encounter strange > problems on a new mainboard with Intel Q35 chipset and a shared memory > graphics card. The logs and data I use here are from a SLED10 SP1 (x86_64) > installation, but the problem occurs whatever distribution I try out. > > Ok, short description of the problem: > > I run 64-bit Linux using 2 GB of RAM, no problem at all. Then I turn off the > machine, add 2 more GB so that now I have 4 GB of RAM. Turning it on I see > the splashscreen of the boot loader, starting the kernel turns the screen > black and that's it. The machine comes up, I can even ssh to it over the net. > That is how I obtained the following data. > > First a look at the boot.msg log. I will point out places of interest (at > least the lines I think that are important for further analysis). > > ----------8<-boot.msg--start---------------------------------------------------------- > Inspecting /boot/System.map-2.6.16.46-0.12-smp > Loaded 23173 symbols from /boot/System.map-2.6.16.46-0.12-smp. > Symbols match kernel version 2.6.16. > No module symbols loaded - kernel modules not enabled. > > klogd 1.4.1, log source = ksyslog started. > <4>Bootdata ok (command line is > root=/dev/disk/by-id/scsi-SATA_ST3160815AS_9RX01AP0-part5 vga=0x31a > resume=/dev/sda1 splash=silent) > <5>Linux version 2.6.16.46-0.12-smp (geeko@buildhost) (gcc version 4.1.2 > 20070115 (prerelease) (SUSE Linux)) #1 SMP Thu May 17 14:00:09 UTC 2007 > <6>BIOS-provided physical RAM map: > <4> BIOS-e820: 0000000000000000 - 000000000009d800 (usable) > <4> BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved) > <4> BIOS-e820: 00000000000ce000 - 00000000000d0000 (reserved) > <4> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) > <4> BIOS-e820: 0000000000100000 - 00000000df5d0000 (usable) > <4> BIOS-e820: 00000000df5d0000 - 00000000df5dc000 (ACPI data) > <4> BIOS-e820: 00000000df5dc000 - 00000000df5df000 (ACPI NVS) > <4> BIOS-e820: 00000000df5df000 - 00000000df700000 (reserved) > <4> BIOS-e820: 00000000df800000 - 00000000e0100000 (reserved) > <4> BIOS-e820: 00000000f8000000 - 00000000fc000000 (reserved) > <4> BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) > <4> BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) > <4> BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved) > > ********* Comment: The following 2 lines are added when I upgrade from 2 GB to > 4 GB. > > <4> BIOS-e820: 0000000100000000 - 000000011a000000 (usable) > <4> BIOS-e820: 000000011a000000 - 000000011c000000 (reserved) > <6>DMI present. > <7>ACPI: RSDP (v000 PTLTD ) @ > 0x00000000000f7240 > <7>ACPI: RSDT (v001 PTLTD RSDT 0x00060000 LTP 0x00000000) @ > 0x00000000df5d6d0b > <7>ACPI: FADT (v001 FSC 0x00060000 0x000f4240) @ > 0x00000000df5dba5f > <7>ACPI: TCPA (v001 Phoeni x 0x00060000 TL 0x00000000) @ > 0x00000000df5dbad3 > <7>ACPI: _MAR (v001 Intel OEMDMAR 0x00060000 LOHR 0x00000001) @ > 0x00000000df5dbb05 > <7>ACPI: SSDT (v001 FSC PST_CPU0 0x00060000 CSF 0x00000001) @ > 0x00000000df5dbb35 > <7>ACPI: SSDT (v001 FSC PST_CPU1 0x00060000 CSF 0x00000001) @ > 0x00000000df5dbbeb > <7>ACPI: SSDT (v001 FSC PST_CPU2 0x00060000 CSF 0x00000001) @ > 0x00000000df5dbca1 > <7>ACPI: SSDT (v001 FSC PST_CPU3 0x00060000 CSF 0x00000001) @ > 0x00000000df5dbd57 > <7>ACPI: SPCR (v001 PTLTD $UCRTBL$ 0x00060000 PTL 0x00000001) @ > 0x00000000df5dbe0d > <7>ACPI: MCFG (v001 PTLTD MCFG 0x00060000 LTP 0x00000000) @ > 0x00000000df5dbe5d > <7>ACPI: HPET (v001 PTLTD HPETTBL 0x00060000 LTP 0x00000001) @ > 0x00000000df5dbe99 > <7>ACPI: MADT (v001 PTLTD APIC 0x00060000 LTP 0x00000000) @ > 0x00000000df5dbed1 > <7>ACPI: BOOT (v001 PTLTD $SBFTBL$ 0x00060000 LTP 0x00000001) @ > 0x00000000df5dbf55 > <7>ACPI: ASF! (v016 CETP CETP 0x00060000 PTL 0x00000001) @ > 0x00000000df5dbf7d > <7>ACPI: DSDT (v001 FSC D2587/A1 0x00060000 MSFT 0x03000001) @ > 0x0000000000000000 > <6>No NUMA configuration found > <6>Faking a node at 0000000000000000-000000011a000000 > <6>Bootmem setup node 0 0000000000000000-000000011a000000 > <7>On node 0 totalpages: 1004539 > <7> DMA zone: 2979 pages, LIFO batch:0 > <7> DMA32 zone: 896520 pages, LIFO batch:31 > <7> Normal zone: 105040 pages, LIFO batch:31 > <7> HighMem zone: 0 pages, LIFO batch:0 > <6>ACPI: PM-Timer IO Port: 0x1008 > <7>ACPI: Local APIC address 0xfee00000 > <6>ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > <6>Processor #0 6:15 APIC version 20 > <6>ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) > <6>Processor #1 6:15 APIC version 20 > <6>ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) > <6>Processor #2 6:15 APIC version 20 > <6>ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) > <6>Processor #3 6:15 APIC version 20 > <6>ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) > <6>ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) > <6>ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) > <6>ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) > <6>ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0]) > <6>IOAPIC[0]: apic_id 4, version 32, address 0xfec00000, GSI 0-23 > <6>ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) > <6>ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > <7>ACPI: IRQ0 used by override. > <7>ACPI: IRQ2 used by override. > <7>ACPI: IRQ9 used by override. > <6>Setting APIC routing to physical flat > <6>ACPI: HPET id: 0xffffffff base: 0xfed00000 > <6>Using ACPI (MADT) for SMP configuration information > <6>Allocating PCI resources starting at e2000000 (gap: e0100000:17f00000) > <6>SMP: Allowing 4 CPUs, 0 hotplug CPUs > <4>Built 1 zonelists > <5>Kernel command line: > root=/dev/disk/by-id/scsi-SATA_ST3160815AS_9RX01AP0-part5 vga=0x31a > resume=/dev/sda1 splash=silent > <6>bootsplash: silent mode. > <4>Initializing CPU#0 > <4>PID hash table entries: 4096 (order: 12, 131072 bytes) > <6>time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. > <6>time.c: Detected 2394.006 MHz processor. > <4>Console: colour dummy device 80x25 > <4>Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) > <4>Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) > <4>Checking aperture... > <6>PCI-DMA: Using software bounce buffering for IO (SWIOTLB) > <6>Placing software IO TLB between 0x166f000 - 0x566f000 > <4>Nosave address range: 000000000009d000 - 000000000009e000 > <4>Nosave address range: 000000000009e000 - 00000000000a0000 > <4>Nosave address range: 00000000000a0000 - 00000000000ce000 > <4>Nosave address range: 00000000000ce000 - 00000000000d0000 > <4>Nosave address range: 00000000000d0000 - 00000000000e0000 > <4>Nosave address range: 00000000000e0000 - 0000000000100000 > <4>Nosave address range: 00000000df5d0000 - 00000000df5dc000 > <4>Nosave address range: 00000000df5dc000 - 00000000df5df000 > <4>Nosave address range: 00000000df5df000 - 00000000df700000 > <4>Nosave address range: 00000000df700000 - 00000000df800000 > <4>Nosave address range: 00000000df800000 - 00000000e0100000 > <4>Nosave address range: 00000000e0100000 - 00000000f8000000 > <4>Nosave address range: 00000000f8000000 - 00000000fc000000 > <4>Nosave address range: 00000000fc000000 - 00000000fec00000 > <4>Nosave address range: 00000000fec00000 - 00000000fec10000 > <4>Nosave address range: 00000000fec10000 - 00000000fee00000 > <4>Nosave address range: 00000000fee00000 - 00000000fee01000 > <4>Nosave address range: 00000000fee01000 - 00000000ffb00000 > <4>Nosave address range: 00000000ffb00000 - 0000000100000000 > <4>Memory: 3942960k/4620288k available (1918k kernel code, 142212k reserved, > 886k data, 196k init) > <4>Calibrating delay using timer specific routine.. 4792.10 BogoMIPS > (lpj=9584204) > <6>Security Framework v1.0.0 initialized > <4>Mount-cache hash table entries: 256 > <6>CPU: L1 I cache: 32K, L1 D cache: 32K > <6>CPU: L2 cache: 4096K > <4>using mwait in idle threads. > <6>CPU: Physical Processor ID: 0 > <6>CPU: Processor Core ID: 0 > <6>CPU0: Thermal monitoring enabled (TM2) > <6>checking if image is initramfs... it is > <4>Freeing initrd memory: 2638k freed > <4> not found! > <6>activating NMI Watchdog ... done. > <6>Using local APIC timer interrupts. > <4>result 16625020 > <6>Detected 16.625 MHz APIC timer. > <6>Booting processor 1/4 APIC 0x1 > <4>Initializing CPU#1 > <4>Calibrating delay using timer specific routine.. 4788.04 BogoMIPS > (lpj=9576089) > <6>CPU: L1 I cache: 32K, L1 D cache: 32K > <6>CPU: L2 cache: 4096K > <6>CPU: Physical Processor ID: 0 > <6>CPU: Processor Core ID: 1 > <6>CPU1: Thermal monitoring enabled (TM2) > <4>Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz stepping 0b > <6>Booting processor 2/4 APIC 0x2 > <4>Initializing CPU#2 > <4>Calibrating delay using timer specific routine.. 4788.10 BogoMIPS > (lpj=9576217) > <6>CPU: L1 I cache: 32K, L1 D cache: 32K > <6>CPU: L2 cache: 4096K > <6>CPU: Physical Processor ID: 0 > <6>CPU: Processor Core ID: 2 > <6>CPU2: Thermal monitoring enabled (TM2) > <4>Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz stepping 0b > <6>Booting processor 3/4 APIC 0x3 > <4>Initializing CPU#3 > <4>Calibrating delay using timer specific routine.. 4788.10 BogoMIPS > (lpj=9576210) > <6>CPU: L1 I cache: 32K, L1 D cache: 32K > <6>CPU: L2 cache: 4096K > <6>CPU: Physical Processor ID: 0 > <6>CPU: Processor Core ID: 3 > <6>CPU3: Thermal monitoring enabled (TM2) > <4>Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz stepping 0b > <6>Brought up 4 CPUs > <6>testing NMI watchdog ... OK. > <4>migration_cost=10,3344 > <6>NET: Registered protocol family 16 > <6>ACPI: bus type pci registered > <6>PCI: Using configuration type 1 > <3>PCI: BIOS Bug: MCFG area is not E820-reserved > <3>PCI: Not using MMCONFIG. > <6>ACPI: Subsystem revision 20060127 > <6>ACPI: Interpreter enabled > <6>ACPI: Using IOAPIC for interrupt routing > <6>ACPI: PCI Root Bridge [PCI0] (0000:00) > <7>PCI: Probing PCI hardware (bus 00) > <7>Boot video device is 0000:00:02.0 > <6>PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2 > <6>PCI: Transparent bridge - 0000:00:1e.0 > <7>ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] > <7>ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEXA._PRT] > <7>ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEXB._PRT] > <7>ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEXC._PRT] > <7>ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCIH._PRT] > <4>ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 7 9 10 *11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 7 9 10 11 12 14 15) *5 > <4>ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 7 9 *10 11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 7 9 10 *11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 7 *9 10 11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 7 9 10 *11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 7 9 10 *11 12 14 15) > <4>ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 7 9 10 *11 12 14 15) > <4>ACPI: Device [ECP] status [00000008]: functional but not present; setting > present > <4>ACPI: Device [COM2] status [00000008]: functional but not present; setting > present > <6>PCI: Using ACPI for IRQ routing > <6>PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a > report > <3>PCI: Cannot allocate resource region 7 of bridge 0000:00:01.0 > <3>PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.0 > <3>PCI: Cannot allocate resource region 7 of bridge 0000:00:1c.4 > <3>PCI: Cannot allocate resource region 2 of device 0000:00:02.0 > > ************* Question: How do I get more info on the messages above? How can > I debug that and find a root cause for the messages? 0000:00:02.0 is the VGA > device that is giving me those headaches BTW. > > <6>hpet0: at MMIO 0xfed00000 (virtual 0xffffffffff5fe000), IRQs 2, 8, 0, 0 > <6>hpet0: 4 64-bit timers, 14318180 Hz > <4>PCI: Ignore bogus resource 6 [0:0] of 0000:00:02.0 > <6>PCI: Bridge: 0000:00:01.0 > <6> IO window: disabled. > <6> MEM window: disabled. > <6> PREFETCH window: disabled. > <6>PCI: Bridge: 0000:00:1c.0 > <6> IO window: disabled. > <6> MEM window: disabled. > <6> PREFETCH window: disabled. > <6>PCI: Bridge: 0000:00:1c.4 > <6> IO window: disabled. > <6> MEM window: disabled. > <6> PREFETCH window: disabled. > <6>PCI: Bridge: 0000:00:1e.0 > <6> IO window: disabled. > <6> MEM window: disabled. > <6> PREFETCH window: disabled. > <6>GSI 16 sharing vector 0xA9 and IRQ 16 > <6>ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169 > <7>PCI: Setting latency timer of device 0000:00:01.0 to 64 > <6>GSI 17 sharing vector 0xB1 and IRQ 17 > <6>ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 18 (level, low) -> IRQ 177 > <7>PCI: Setting latency timer of device 0000:00:1c.0 to 64 > <6>GSI 18 sharing vector 0xB9 and IRQ 18 > <6>ACPI: PCI Interrupt 0000:00:1c.4[B] -> GSI 22 (level, low) -> IRQ 185 > <7>PCI: Setting latency timer of device 0000:00:1c.4 to 64 > <7>PCI: Setting latency timer of device 0000:00:1e.0 to 64 > <6>Simple Boot Flag at 0x43 set to 0x1 > <4>IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $ > <6>audit: initializing netlink socket (disabled) > <5>audit(1194361547.452:1): initialized > <4>Total HugeTLB memory allocated, 0 > <5>VFS: Disk quotas dquot_6.5.1 > <4>Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > <6>Initializing Cryptographic API > <6>io scheduler noop registered > <6>io scheduler anticipatory registered > <6>io scheduler deadline registered > <6>io scheduler cfq registered (default) > <6>ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 169 > <7>PCI: Setting latency timer of device 0000:00:01.0 to 64 > <4>assign_interrupt_mode Found MSI capability > <7>Allocate Port Service[0000:00:01.0:pcie00] > <6>ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 18 (level, low) -> IRQ 177 > <7>PCI: Setting latency timer of device 0000:00:1c.0 to 64 > <4>assign_interrupt_mode Found MSI capability > <7>Allocate Port Service[0000:00:1c.0:pcie00] > <7>Allocate Port Service[0000:00:1c.0:pcie02] > <6>ACPI: PCI Interrupt 0000:00:1c.4[B] -> GSI 22 (level, low) -> IRQ 185 > <7>PCI: Setting latency timer of device 0000:00:1c.4 to 64 > <4>assign_interrupt_mode Found MSI capability > <7>Allocate Port Service[0000:00:1c.4:pcie00] > <7>Allocate Port Service[0000:00:1c.4:pcie02] > <4>vesafb: cannot reserve video memory at 0xe0000000 > > ********* Comment: I guess this is why I get a black screen. But why can't > that memory be reserverved? Looks like the BIOS reserved part of that memory range already. Question is why vesafb is trying to reserve that range in the first place though. > > <6>vesafb: framebuffer at 0xe0000000, mapped to 0xffffc20000080000, using > 8128k, total 8128k > > ************ Comment: The framebuffer address with 0xfff... looks pretty > strange, doesn't it? No, that's a virtual address. > > <6>vesafb: mode is 1280x1024x16, linelength=2560, pages=2 > <6>vesafb: scrolling: redraw > <6>vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0 > <6>bootsplash 3.1.6-2004/03/31: looking for picture...<6> silentjpeg size > 65383 bytes,<6>...found (1280x1024, 55554 bytes, v3). > <4>Console: switching to colour frame buffer device 155x60 > <6>fb0: VESA VGA frame buffer device > <6>Real Time Clock Driver v1.12ac > <7>hpet_resources: 0xfed00000 is busy > <6>Non-volatile memory driver v1.2 > <6>Linux agpgart interface v0.101 (c) Dave Jones > <6>serio: i8042 AUX port at 0x60,0x64 irq 12 > <6>serio: i8042 KBD port at 0x60,0x64 irq 1 > <6>Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled > <6>serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A > <6>GSI 19 sharing vector 0xE1 and IRQ 19 > <6>ACPI: PCI Interrupt 0000:00:03.3[B] -> GSI 17 (level, low) -> IRQ 225 > <6>0000:00:03.3: ttyS1 at I/O 0x1c90 (irq = 225) is a 16550A > <4>RAMDISK driver initialized: 16 RAM disks of 128000K size 1024 blocksize > <6>mice: PS/2 mouse device common for all mice > <6>input: AT Translated Set 2 keyboard as /class/input/input0 > <6>input: PC Speaker as /class/input/input1 > <6>md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 > <6>md: bitmap version 4.39 > <6>NET: Registered protocol family 2 > <4>IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) > <4>TCP established hash table entries: 262144 (order: 10, 4194304 bytes) > <4>TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) > <6>TCP: Hash tables configured (established 262144 bind 65536) > <6>TCP reno registered > <6>NET: Registered protocol family 1 > <4>ACPI wakeup devices: > <4>PEXA PEXX PEXB PEXC PEXD USB1 USB2 USB3 USB4 USB5 USB6 USB7 USB8 USB9 LAN > PCIH KEYB PS2M COM1 COM2 > <6>ACPI: (supports S0 S1 S3 S4 S5) > <4>Freeing unused kernel memory: 196k freed > <6>Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 > <6>ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > <5>SCSI subsystem initialized > <7>libata version 2.00 loaded. > <7>ata_piix 0000:00:1f.2: version 2.00ac7 > <6>ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] > <6>GSI 20 sharing vector 0xE9 and IRQ 20 > ----------8<-boot.msg--end----------------------------------------------------------- > > Ok, so far so good. Looks like my problems come from the PCI resources of the > VGA card. So I did a lspci on that device and got another surprise: > > 00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics > Controller (rev 02) (prog-if 00 [VGA]) > Subsystem: Fujitsu Siemens Computer GmbH Unknown device 10fc > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- > SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 > Interrupt: pin A routed to IRQ 11 > Region 0: Memory at f0100000 (32-bit, non-prefetchable) [size=512K] > Region 1: I/O ports at 1c70 [size=8] > Region 2: Memory at 120000000 (32-bit, prefetchable) [size=256M] > Region 3: Memory at f0000000 (32-bit, non-prefetchable) [size=1M] > Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 > Enable- > Address: 00000000 Data: 0000 > Capabilities: [d0] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > 00: 86 80 b2 29 07 00 90 00 02 00 00 03 00 00 00 00 > 10: 00 00 10 f0 71 1c 00 00 08 00 00 20 00 00 00 f0 > 20: 00 00 00 00 00 00 00 00 00 00 00 00 34 17 fc 10 > 30: 00 00 00 00 90 00 00 00 00 00 00 00 0b 01 00 00 > 40: 09 00 0b b1 62 00 a0 c8 46 04 16 00 00 00 00 00 > 50: 00 00 30 01 4b 03 00 00 00 00 00 00 00 00 80 df > 60: 00 00 02 02 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 05 d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 11 11 00 00 00 00 06 03 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 01 00 22 00 00 00 00 00 00 00 00 00 00 01 02 00 > e0: 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 > f0: 12 00 03 00 00 00 00 00 90 0f 03 00 d1 cd 5d df > > Question: Memory region 2 at 120000000? That is beyond the 4GB boundary and > the BIOS guys I know told me that every PCI IOMEM region should reside within > the first 4 GBs! When running the machine with 2 GB only lspci output looks > like this for the VGA device: 64-bit capable PCI devices can indeed have BARs which can be located above 4GB. However, I can't see why lspci is detecting that from this configuration space: the BAR contents for region 2 are 20000008, which means prefetchable memory at 0x20000000 which can be located anywhere within 32-bit memory space. That doesn't make any sense though, since that's in the middle of RAM! Quite likely this bogus resource setting of the graphics controller is a large part of your problem. Question is who's doing this.. > > 00:02.0 Class 0300: 8086:29b2 (rev 02) (prog-if 00 [VGA]) > Subsystem: 1734:10fc > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- > SERR- FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 > Interrupt: pin A routed to IRQ 11 > Region 0: Memory at f0100000 (32-bit, non-prefetchable) [size=512K] > Region 1: I/O ports at 1c70 [size=8] > Region 2: Memory at e0000000 (32-bit, prefetchable) [size=256M] > Region 3: Memory at f0000000 (32-bit, non-prefetchable) [size=1M] > Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 > Enable- > Address: 00000000 Data: 0000 > Capabilities: [d0] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > > That means now we get the region 2 at e0000000 and everything is fine. This looks more reasonable, though it's still mapped the BAR over top of a region that's reserved in the E820 memory map, which is wrong.. > > I tried to have a look at the PCI config space with a DOS tool from the year > 1999, the hexdump looked pretty much similar, but byte 1A changed from "20" > to "e0". And the region was declared as e0000008, which also looked strange > to me. That looks more reasonable (e0000000, not 20000000). The last 4 bits are used to encode the prefetchable flag and memory space. Question is how that got set in the bogus fashion in Linux.. > > This is where I am now. I also with an Intel reference mainboard (same > chipset, different BIOS) and this one didn't show the problem. That makes me > think that the problem is somewhere in the BIOS, but where? I have access to > the BIOS developers, but they don't know much about Linux and since the other > operating systems from Redmond are running without problems they are hard to > convince that they made a mistake. :-) > > Another very bad side effect of the problem is that when the machine runs on > 32-bit Linux then the graphic card seems to work, but people report corrupted > file systems after a while. I guess that is related to my problem on 64 bit, > only that in the 32-bit case then filesystem buffers got overwritten by the > video RAM and when they are written back to disk... ouch! > > I also tried to track the problem down with the CD from LinuxFirmwareKit.org, > but the resource allocation errors are the same and unfortunately the lack of > verbosity as well. > > Ok, that's it. Any help is much appreciated. > Thanks > Rainer -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/