2010-07-14 01:02:10

by Ben Greear

[permalink] [raw]
Subject: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

We're seeing boot failures on multiple machines, running FC8 and
F11. I bisected on an FC8 32-bit system. Newer hardware works,
but these older ones do not.

A console log of the hang is found later in this email.

Please let me know if you would like any additional information,
and I will be happy to test patches.

The same failure happens in 2.6.34.1, so the fix does not appear to
be in the stable tree yet.


[greearb@fs2 linux-2.6]$ git bisect good
a712ffbc199849364c46e9112b93b66de08e2c26 is first bad commit
commit a712ffbc199849364c46e9112b93b66de08e2c26
Author: Jesse Barnes <[email protected]>
Date: Thu Feb 4 10:59:27 2010 -0800

x86/PCI: Moorestown PCI support

The Moorestown platform only has a few devices that actually support
PCI config cycles. The rest of the devices use an in-RAM MCFG space
for the purposes of device enumeration and initialization.

There are a few uglies in the fake support, like BAR sizes that aren't
a power of two, sizing detection, and writes to the real devices, but
other than that it's pretty straightforward.

Another way to think of this is not really as PCI at all, but just a
table in RAM describing which devices are present, their capabilities
and their offsets in MMIO space. This could have been done with a
special new firmware table on this platform, but given that we do have
some real PCI devices too, simply describing things in an MCFG type
space was pretty simple.

Signed-off-by: Jesse Barnes <[email protected]>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D08@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>

:040000 040000 56d09488bc71d1ab844bc02363cc75f89426a7ef 9bf1a608677af8b32fa4dbae8ba8cf965f4fb4e8 M arch
:040000 040000 a4cfa1da638f4870cb0c32f4a34a9acfb4157f02 4283c36b6f856e0bf99a9ef9d6db44d91bd44bb2 M include



### Console log of failure ###

root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /ct2.6.33-rc8-compat.img ro root=/dev/VolGroup00/LogVol00 console=ttyS0,
38400
[Linux-bzImage, setup=0x3600, size=0x330d70]
initrd /initrd-ct2.6.33-rc8-compat.img
[Linux-initrd @ 0x37cac000, 0x343add bytes]

Initializing cgroup subsys cpuset
Linux version 2.6.33-rc8-compat ([email protected]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #13 SMP Tue Jul 13 17:0
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cff50000 (usable)
BIOS-e820: 00000000cff50000 - 00000000cff65000 (ACPI data)
BIOS-e820: 00000000cff65000 - 00000000cff80000 (ACPI NVS)
BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
DMI present.
Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
last_pfn = 0xcff50 max_arch_pfn = 0x100000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
total RAM covered: 8191M
Found optimal setting for mtrr clean up
gran_size: 64K chunk_size: 1M num_reg: 7 lose cover RAM: 0G
found SMP MP-table at [c00f6200] f6200
init_memory_mapping: 0000000000000000-00000000373fe000
RAMDISK: 37cac000 - 37fefadd
Allocated new RAMDISK: 00ab4000 - 00df7add
Move RAMDISK from 0000000037cac000 - 0000000037fefadc to 00ab4000 - 00df7adc
ACPI: RSDP 000f61d0 00014 (v00 PTLTD )
ACPI: RSDT cff5ea41 00064 (v01 SMCI SMCISLP2 06040000 LTP 00000000)
ACPI: FACP cff6448a 00074 (v01 INTEL TUMWATER 06040000 PTL 00000003)
ACPI: DSDT cff60577 03F13 (v01 Intel BLAKFORD 06040000 MSFT 0100000E)
ACPI: FACS cff65fc0 00040
ACPI: APIC cff644fe 000C8 (v01 PTLTD ? APIC 06040000 LTP 00000000)
ACPI: MCFG cff645c6 0003C (v01 PTLTD MCFG 06040000 LTP 00000000)
ACPI: HPET cff64602 00038 (v01 PTLTD HPETTBL 06040000 LTP 00000001)
ACPI: BOOT cff6463a 00028 (v01 PTLTD $SBFTBL$ 06040000 LTP 00000001)
ACPI: SPCR cff64662 00050 (v01 PTLTD $UCRTBL$ 06040000 PTL 00000001)
ACPI: SLIC cff646b2 00176 (v01 SMCI SMCISLP2 06040000 LTP 00000000)
ACPI: SSDT cff60318 0025F (v01 PmRef Cpu0Tst 00003000 INTL 20050228)
ACPI: SSDT cff60272 000A6 (v01 PmRef Cpu7Tst 00003000 INTL 20050228)
ACPI: SSDT cff601cc 000A6 (v01 PmRef Cpu6Tst 00003000 INTL 20050228)
ACPI: SSDT cff60126 000A6 (v01 PmRef Cpu5Tst 00003000 INTL 20050228)
ACPI: SSDT cff60080 000A6 (v01 PmRef Cpu4Tst 00003000 INTL 20050228)
ACPI: SSDT cff5ffda 000A6 (v01 PmRef Cpu3Tst 00003000 INTL 20050228)
ACPI: SSDT cff5ff34 000A6 (v01 PmRef Cpu2Tst 00003000 INTL 20050228)
ACPI: SSDT cff5fe8e 000A6 (v01 PmRef Cpu1Tst 00003000 INTL 20050228)
ACPI: SSDT cff5eaa5 013E9 (v01 PmRef CpuPm 00003000 INTL 20050228)
2443MB HIGHMEM available.
883MB LOWMEM available.
mapped low ram: 0 - 373fe000
low ram: 0 - 373fe000
node 0 low ram: 00000000 - 373fe000
node 0 bootmap 00017000 - 0001de80
(14 early reservations) ==> bootmem [0000000000 - 00373fe000]
#0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
#1 [0000001000 - 0000002000] EX TRAMPOLINE ==> [0000001000 - 0000002000]
#2 [0000400000 - 0000aaec40] TEXT DATA BSS ==> [0000400000 - 0000aaec40]
#3 [0000aaf000 - 0000ab3174] BRK ==> [0000aaf000 - 0000ab3174]
#4 [00000f6210 - 0000100000] BIOS reserved ==> [00000f6210 - 0000100000]
#5 [00000f6200 - 00000f6210] MP-table mpf ==> [00000f6200 - 00000f6210]
#6 [000009c000 - 000009e431] BIOS reserved ==> [000009c000 - 000009e431]
#7 [000009e695 - 00000f6200] BIOS reserved ==> [000009e695 - 00000f6200]
#8 [000009e431 - 000009e695] MP-table mpc ==> [000009e431 - 000009e695]
#9 [0000010000 - 0000011000] TRAMPOLINE ==> [0000010000 - 0000011000]
#10 [0000011000 - 0000015000] ACPI WAKEUP ==> [0000011000 - 0000015000]
#11 [0000015000 - 0000017000] PGTABLE ==> [0000015000 - 0000017000]
#12 [0000ab4000 - 0000df7add] NEW RAMDISK ==> [0000ab4000 - 0000df7add]
#13 [0000017000 - 000001e000] BOOTMAP ==> [0000017000 - 000001e000]
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
Normal 0x00001000 -> 0x000373fe
HighMem 0x000373fe -> 0x000cff50
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0: 0x00000010 -> 0x0000009c
0: 0x00000100 -> 0x000cff50
Using APIC driver default
ACPI: PM-Timer IO Port: 0x1008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x04] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x05] enabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x06] enabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a201 base: 0xfed00000
SMP: Allowing 8 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 000000000009c000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000cc000
PM: Registered nosave memory: 00000000000cc000 - 00000000000d0000
PM: Registered nosave memory: 00000000000d0000 - 00000000000e4000
PM: Registered nosave memory: 00000000000e4000 - 0000000000100000
Allocating PCI resources starting at d0000000 (gap: d0000000:10000000)
Booting paravirtualized kernel on bare hardware
setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:8 nr_node_ids:1
PERCPU: Embedded 14 pages/cpu @c2c00000 s35352 r0 d21992 u524288
pcpu-alloc: s35352 r0 d21992 u524288 alloc=1*4194304
pcpu-alloc: [0] 0 1 2 3 4 5 6 7
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 845021
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0, 38400
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
allocated 17035520 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Initializing HighMem for node 0 (000373fe:000cff50)
Memory: 3350528k/3407168k available (3472k kernel code, 54872k reserved, 2165k data, 448k init, 2501960k highmem)
virtual kernel memory layout:
fixmap : 0xffd54000 - 0xfffff000 (2732 kB)
pkmap : 0xff400000 - 0xff800000 (4096 kB)
vmalloc : 0xf7bfe000 - 0xff3fe000 ( 120 MB)
lowmem : 0xc0000000 - 0xf73fe000 ( 883 MB)
.init : 0xc0982000 - 0xc09f2000 ( 448 kB)
.data : 0xc07642f3 - 0xc09819a8 (2165 kB)
.text : 0xc0400000 - 0xc07642f3 (3472 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
Hierarchical RCU implementation.
NR_IRQS:1280
Console: colour VGA+ 80x25
console [ttyS0] enabled
Fast TSC calibration using PIT
Detected 2000.006 MHz processor.
Calibrating delay loop (skipped), value calculated using timer frequency.. 4000.01 BogoMIPS (lpj=2000006)
Security Framework initialized
SELinux: Initializing.
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
Initializing cgroup subsys blkio
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
mce: CPU supports 6 MCE banks
using mwait in idle threads.
Performance Events: Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
Checking 'hlt' instruction... OK.
ACPI: Core revision 20091214
Enabling APIC mode: Flat. Using 2 I/O APICs
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping 06
Booting Node 0, Processors #1
Initializing CPU#1
#2
Initializing CPU#2
#3
Initializing CPU#3
#4
Initializing CPU#4
#5
Initializing CPU#5
#6
Initializing CPU#6
#7 Ok.
Initializing CPU#7
Brought up 8 CPUs
Total of 8 processors activated (32000.11 BogoMIPS).
devtmpfs: initialized
regulator: core version 0.5
Time: 0:13:15 Date: 07/14/10
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-0d] at [mem 0xe0000000-0xe0dfffff] (base 0xe0000000)
PCI: MMCONFIG at [mem 0xe0000000-0xe0dfffff] reserved in E820
PCI: Using MMCONFIG for extended config space
PCI: Using configuration type 1 for base access
bio: create slab <bio-0> at 0
ACPI Error (psargs-0359): [Z000] Namespace lookup failure, AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_._OSC] (Node f6c34498), AE_NOT_FOUND
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci_root PNP0A03:00: ignoring host bridge windows from ACPI; boot with "pci=use_crs" to use them




[root@ice-si-dmz ~]# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 6
cpu MHz : 2000.193
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon
pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm tpr_shadow vnmi flexpriority
bogomips : 4000.38
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
.. x8


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2010-07-14 01:17:20

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/13/2010 05:36 PM, Ben Greear wrote:
> We're seeing boot failures on multiple machines, running FC8 and
> F11. I bisected on an FC8 32-bit system. Newer hardware works,
> but these older ones do not.
>
> A console log of the hang is found later in this email.
>
> Please let me know if you would like any additional information,
> and I will be happy to test patches.
>
> The same failure happens in 2.6.34.1, so the fix does not appear to
> be in the stable tree yet.


I added some printks to the offending code. It seems the problem
is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:

# Endless loop of this spewing to console...

pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..
pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..
pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..
pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..
pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..
pcie_cap: 268435456Checking vendor..


static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
{
int pos;
u32 pcie_cap = 0, cap_data;
printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
pos = PCIE_CAP_OFFSET;
while (pos) {
printk("Before read..\n");
if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
devfn, pos, 4, &pcie_cap))
return 0;
printk("pcie_cap: %u", pcie_cap);

if (pcie_cap == 0xffffffff)
return 0;

printk("Checking vendor..\n");
if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
printk("reading domain_nr\n");
raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
devfn, pos + 4, 4, &cap_data);
printk("cap_data: %u\n", cap_data);
if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
return pos;
}

pos = pcie_cap >> 20;
printk("pos after shift: %i\n", pos);
}

printk("Returning from fixed_bar_cap\n");
return 0;
}


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 01:56:25

by Robert Hancock

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/13/2010 07:17 PM, Ben Greear wrote:
> On 07/13/2010 05:36 PM, Ben Greear wrote:
>> We're seeing boot failures on multiple machines, running FC8 and
>> F11. I bisected on an FC8 32-bit system. Newer hardware works,
>> but these older ones do not.
>>
>> A console log of the hang is found later in this email.
>>
>> Please let me know if you would like any additional information,
>> and I will be happy to test patches.
>>
>> The same failure happens in 2.6.34.1, so the fix does not appear to
>> be in the stable tree yet.
>
>
> I added some printks to the offending code. It seems the problem
> is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:
>
> # Endless loop of this spewing to console...
>
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..

Can you print out bus->number and devfn and look that up in lspci to
find out which device it's hitting? It looks like there's a device with
a PCI Express extended capability header that has a extended capability
ID of 0000h and a next capability offset of 100h, which points to
itself, causing the infinite loop. I'm guessing that if pcie_cap >> 20
<= pos then it should give up and break out of the loop, since it means
that the next capability pointer is invalidly pointing to the same or a
previous entry..

2010-07-14 02:20:19

by Jesse Barnes

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On Tue, 13 Jul 2010 18:17:18 -0700
Ben Greear <[email protected]> wrote:

> On 07/13/2010 05:36 PM, Ben Greear wrote:
> > We're seeing boot failures on multiple machines, running FC8 and
> > F11. I bisected on an FC8 32-bit system. Newer hardware works,
> > but these older ones do not.
> >
> > A console log of the hang is found later in this email.
> >
> > Please let me know if you would like any additional information,
> > and I will be happy to test patches.
> >
> > The same failure happens in 2.6.34.1, so the fix does not appear to
> > be in the stable tree yet.
>
>
> I added some printks to the offending code. It seems the problem
> is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:
>
> # Endless loop of this spewing to console...
>
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
> pos after shift: 256
> Before read..
> pcie_cap: 268435456Checking vendor..
>
>
> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
> {
> int pos;
> u32 pcie_cap = 0, cap_data;
> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
> pos = PCIE_CAP_OFFSET;
> while (pos) {
> printk("Before read..\n");
> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> devfn, pos, 4, &pcie_cap))
> return 0;
> printk("pcie_cap: %u", pcie_cap);
>
> if (pcie_cap == 0xffffffff)
> return 0;
>
> printk("Checking vendor..\n");
> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
> printk("reading domain_nr\n");
> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> devfn, pos + 4, 4, &cap_data);
> printk("cap_data: %u\n", cap_data);
> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
> return pos;
> }
>
> pos = pcie_cap >> 20;
> printk("pos after shift: %i\n", pos);
> }
>
> printk("Returning from fixed_bar_cap\n");
> return 0;
> }
>
>

I thought a related bug was fixed already; the code should be returning
all zeros for non-existent BAR reads.

--
Jesse Barnes, Intel Open Source Technology Center

2010-07-14 02:23:07

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/13/2010 06:56 PM, Robert Hancock wrote:
> On 07/13/2010 07:17 PM, Ben Greear wrote:
>> On 07/13/2010 05:36 PM, Ben Greear wrote:
>>> We're seeing boot failures on multiple machines, running FC8 and
>>> F11. I bisected on an FC8 32-bit system. Newer hardware works,
>>> but these older ones do not.
>>>
>>> A console log of the hang is found later in this email.
>>>
>>> Please let me know if you would like any additional information,
>>> and I will be happy to test patches.
>>>
>>> The same failure happens in 2.6.34.1, so the fix does not appear to
>>> be in the stable tree yet.
>>
>>
>> I added some printks to the offending code. It seems the problem
>> is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:
>>
>> # Endless loop of this spewing to console...
>>
>> pcie_cap: 268435456Checking vendor..
>> pos after shift: 256
>> Before read..
>
> Can you print out bus->number and devfn and look that up in lspci to
> find out which device it's hitting? It looks like there's a device with
> a PCI Express extended capability header that has a extended capability
> ID of 0000h and a next capability offset of 100h, which points to
> itself, causing the infinite loop. I'm guessing that if pcie_cap >> 20
> <= pos then it should give up and break out of the loop, since it means
> that the next capability pointer is invalidly pointing to the same or a
> previous entry..

Bailing out like that does let it boot.

As for the bus and devfn: bus: 0 devfn: 129 (decimal)

I'm not sure what to look for in lspci, but here is the output with -n:

[root@ice-si-dmz ~]# lspci -n
00:00.0 0600: 8086:25d8 (rev b1)
00:02.0 0604: 8086:25f7 (rev b1)
00:04.0 0604: 8086:25f8 (rev b1)
00:06.0 0604: 8086:25f9 (rev b1)
00:08.0 0880: 8086:1a38 (rev b1)
00:10.0 0600: 8086:25f0 (rev b1)
00:10.1 0600: 8086:25f0 (rev b1)
00:10.2 0600: 8086:25f0 (rev b1)
00:11.0 0600: 8086:25f1 (rev b1)
00:13.0 0600: 8086:25f3 (rev b1)
00:15.0 0600: 8086:25f5 (rev b1)
00:16.0 0600: 8086:25f6 (rev b1)
00:1d.0 0c03: 8086:2688 (rev 09)
00:1d.1 0c03: 8086:2689 (rev 09)
00:1d.2 0c03: 8086:268a (rev 09)
00:1d.7 0c03: 8086:268c (rev 09)
00:1e.0 0604: 8086:244e (rev d9)
00:1f.0 0601: 8086:2670 (rev 09)
00:1f.1 0101: 8086:269e (rev 09)
00:1f.2 0106: 8086:2681 (rev 09)
00:1f.3 0c05: 8086:269b (rev 09)
01:00.0 0604: 8086:3500 (rev 01)
01:00.3 0604: 8086:350c (rev 01)
02:00.0 0604: 8086:3510 (rev 01)
02:02.0 0604: 8086:3518 (rev 01)
04:00.0 0200: 8086:1096 (rev 01)
04:00.1 0200: 8086:1096 (rev 01)
06:00.0 0604: 111d:8018 (rev 04)
07:00.0 0604: 111d:8018 (rev 04)
07:01.0 0604: 111d:8018 (rev 04)
08:00.0 0200: 8086:10a4 (rev 06)
08:00.1 0200: 8086:10a4 (rev 06)
09:00.0 0200: 8086:10a4 (rev 06)
09:00.1 0200: 8086:10a4 (rev 06)
0a:00.0 0604: 111d:8018 (rev 04)
0b:00.0 0604: 111d:8018 (rev 04)
0b:01.0 0604: 111d:8018 (rev 04)
0c:00.0 0200: 8086:10a4 (rev 06)
0c:00.1 0200: 8086:10a4 (rev 06)
0d:00.0 0200: 8086:10a4 (rev 06)
0d:00.1 0200: 8086:10a4 (rev 06)
0e:01.0 0300: 1002:515e (rev 02)


Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 02:25:08

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/13/2010 07:19 PM, Jesse Barnes wrote:
> On Tue, 13 Jul 2010 18:17:18 -0700

> I thought a related bug was fixed already; the code should be returning
> all zeros for non-existent BAR reads.

The code I pasted was from the exact commit that failed the git bisect. I didn't
check to see if it was identical later..but it definitely won't boot in 2.6.34
on my system.

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 03:29:54

by Robert Hancock

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On Tue, Jul 13, 2010 at 8:22 PM, Ben Greear <[email protected]> wrote:
>> Can you print out bus->number and devfn and look that up in lspci to
>> find out which device it's hitting? It looks like there's a device with
>> a PCI Express extended capability header that has a extended capability
>> ID of 0000h and a next capability offset of 100h, which points to
>> itself, causing the infinite loop. I'm guessing that if pcie_cap >> 20
>> <= pos then it should give up and break out of the loop, since it means
>> that the next capability pointer is invalidly pointing to the same or a
>> previous entry..
>
> Bailing out like that does let it boot.
>
> As for the bus and devfn: ?bus: 0 ? devfn: 129 (decimal)
>
> I'm not sure what to look for in lspci, but here is the output with -n:

That will be device 0x10 function 1, this one:

00:10.1 0600: 8086:25f0 (rev b1)

Intel 5000 Series Chipset FSB Registers, apparently.. What does lspci
-vv show for that device?

>
> [root@ice-si-dmz ~]# lspci -n
> 00:00.0 0600: 8086:25d8 (rev b1)
> 00:02.0 0604: 8086:25f7 (rev b1)
> 00:04.0 0604: 8086:25f8 (rev b1)
> 00:06.0 0604: 8086:25f9 (rev b1)
> 00:08.0 0880: 8086:1a38 (rev b1)
> 00:10.0 0600: 8086:25f0 (rev b1)
> 00:10.1 0600: 8086:25f0 (rev b1)
> 00:10.2 0600: 8086:25f0 (rev b1)
> 00:11.0 0600: 8086:25f1 (rev b1)
> 00:13.0 0600: 8086:25f3 (rev b1)
> 00:15.0 0600: 8086:25f5 (rev b1)
> 00:16.0 0600: 8086:25f6 (rev b1)
> 00:1d.0 0c03: 8086:2688 (rev 09)
> 00:1d.1 0c03: 8086:2689 (rev 09)
> 00:1d.2 0c03: 8086:268a (rev 09)
> 00:1d.7 0c03: 8086:268c (rev 09)
> 00:1e.0 0604: 8086:244e (rev d9)
> 00:1f.0 0601: 8086:2670 (rev 09)
> 00:1f.1 0101: 8086:269e (rev 09)
> 00:1f.2 0106: 8086:2681 (rev 09)
> 00:1f.3 0c05: 8086:269b (rev 09)
> 01:00.0 0604: 8086:3500 (rev 01)
> 01:00.3 0604: 8086:350c (rev 01)
> 02:00.0 0604: 8086:3510 (rev 01)
> 02:02.0 0604: 8086:3518 (rev 01)
> 04:00.0 0200: 8086:1096 (rev 01)
> 04:00.1 0200: 8086:1096 (rev 01)
> 06:00.0 0604: 111d:8018 (rev 04)
> 07:00.0 0604: 111d:8018 (rev 04)
> 07:01.0 0604: 111d:8018 (rev 04)
> 08:00.0 0200: 8086:10a4 (rev 06)
> 08:00.1 0200: 8086:10a4 (rev 06)
> 09:00.0 0200: 8086:10a4 (rev 06)
> 09:00.1 0200: 8086:10a4 (rev 06)
> 0a:00.0 0604: 111d:8018 (rev 04)
> 0b:00.0 0604: 111d:8018 (rev 04)
> 0b:01.0 0604: 111d:8018 (rev 04)
> 0c:00.0 0200: 8086:10a4 (rev 06)
> 0c:00.1 0200: 8086:10a4 (rev 06)
> 0d:00.0 0200: 8086:10a4 (rev 06)
> 0d:00.1 0200: 8086:10a4 (rev 06)
> 0e:01.0 0300: 1002:515e (rev 02)
>
>
> Thanks,
> Ben
>
>
> --
> Ben Greear <[email protected]>
> Candela Technologies Inc ?http://www.candelatech.com
>

2010-07-14 14:14:50

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/13/2010 08:29 PM, Robert Hancock wrote:
> On Tue, Jul 13, 2010 at 8:22 PM, Ben Greear<[email protected]> wrote:
>>> Can you print out bus->number and devfn and look that up in lspci to
>>> find out which device it's hitting? It looks like there's a device with
>>> a PCI Express extended capability header that has a extended capability
>>> ID of 0000h and a next capability offset of 100h, which points to
>>> itself, causing the infinite loop. I'm guessing that if pcie_cap>> 20
>>> <= pos then it should give up and break out of the loop, since it means
>>> that the next capability pointer is invalidly pointing to the same or a
>>> previous entry..
>>
>> Bailing out like that does let it boot.
>>
>> As for the bus and devfn: bus: 0 devfn: 129 (decimal)
>>
>> I'm not sure what to look for in lspci, but here is the output with -n:
>
> That will be device 0x10 function 1, this one:
>
> 00:10.1 0600: 8086:25f0 (rev b1)
>
> Intel 5000 Series Chipset FSB Registers, apparently.. What does lspci
> -vv show for that device?

00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
Subsystem: Super Micro Computer Inc Unknown device 9780
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5000_edac, i5k_amb

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 15:36:27

by Jacob Pan

[permalink] [raw]
Subject: RE: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

what is the config size of 10.1?
ls -l /sys/bus/pci/devices/0000:00:10.1/config

if that is 256, it might be related to this patch.

>From e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <[email protected]>
Date: Fri, 14 May 2010 13:55:57 -0700
Subject: [PATCH] x86, mrst: Don't blindly access extended config space

Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform. The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.

This fixes booting certain VMware configurations with CONFIG_MRST=y.

Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.

Reported-and-tested-by: Petr Vandrovec <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
Acked-by: Jacob Pan <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
LKML-Reference: <[email protected]>
---
arch/x86/pci/mrst.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 8bf2fcb..1cdc02c 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -247,6 +247,10 @@ static void __devinit pci_fixed_bar_fixup(struct pci_dev *dev)
u32 size;
int i;

+ /* Must have extended configuration space */
+ if (dev->cfg_size < PCIE_CAP_OFFSET + 4)
+ return;
+
/* Fixup the BAR sizes for fixed BAR devices and make them unmoveable */
offset = fixed_bar_cap(dev->bus, dev->devfn);
if (!offset || PCI_DEVFN(2, 0) == dev->devfn ||
--
1.6.3.3


>-----Original Message-----
>From: Ben Greear [mailto:[email protected]]
>Sent: Wednesday, July 14, 2010 7:15 AM
>To: Robert Hancock
>Cc: linux-kernel; [email protected]; Pan, Jacob jun
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On 07/13/2010 08:29 PM, Robert Hancock wrote:
>> On Tue, Jul 13, 2010 at 8:22 PM, Ben Greear<[email protected]>
>wrote:
>>>> Can you print out bus->number and devfn and look that up in lspci to
>>>> find out which device it's hitting? It looks like there's a device
>with
>>>> a PCI Express extended capability header that has a extended
>capability
>>>> ID of 0000h and a next capability offset of 100h, which points to
>>>> itself, causing the infinite loop. I'm guessing that if pcie_cap>>
>20
>>>> <= pos then it should give up and break out of the loop, since it
>means
>>>> that the next capability pointer is invalidly pointing to the same
>or a
>>>> previous entry..
>>>
>>> Bailing out like that does let it boot.
>>>
>>> As for the bus and devfn: bus: 0 devfn: 129 (decimal)
>>>
>>> I'm not sure what to look for in lspci, but here is the output with -
>n:
>>
>> That will be device 0x10 function 1, this one:
>>
>> 00:10.1 0600: 8086:25f0 (rev b1)
>>
>> Intel 5000 Series Chipset FSB Registers, apparently.. What does lspci
>> -vv show for that device?
>
>00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers
>(rev b1)
> Subsystem: Super Micro Computer Inc Unknown device 9780
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
>ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
><TAbort- <MAbort- >SERR- <PERR- INTx-
> Kernel modules: i5000_edac, i5k_amb
>
>Thanks,
>Ben
>
>
>--
>Ben Greear <[email protected]>
>Candela Technologies Inc http://www.candelatech.com

2010-07-14 16:09:49

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 08:36 AM, Pan, Jacob jun wrote:
> what is the config size of 10.1?
> ls -l /sys/bus/pci/devices/0000:00:10.1/config
>
> if that is 256, it might be related to this patch.
>
>> From e9b1d5d0ff4d3ae86050dc4c91b3147361c7af9e Mon Sep 17 00:00:00 2001
> From: H. Peter Anvin<[email protected]>
> Date: Fri, 14 May 2010 13:55:57 -0700
> Subject: [PATCH] x86, mrst: Don't blindly access extended config space
>
> Do not blindly access extended configuration space unless we actively
> know we're on a Moorestown platform. The fixed-size BAR capability
> lives in the extended configuration space, and thus is not applicable
> if the configuration space isn't appropriately sized.
>
> This fixes booting certain VMware configurations with CONFIG_MRST=y.
>
> Moorestown will add a fake PCI-X 266 capability to advertise the
> presence of extended configuration space.

I'll try this in a bit, but shouldn't we also check for no-progress in
that while loop and bail out in that case? No reason to hang on
boot just because the bios or whatever is busted?

Thanks,
Ben

>
> Reported-and-tested-by: Petr Vandrovec<[email protected]>
> Signed-off-by: H. Peter Anvin<[email protected]>
> Acked-by: Jacob Pan<[email protected]>
> Acked-by: Jesse Barnes<[email protected]>
> LKML-Reference:<[email protected]>
> ---
> arch/x86/pci/mrst.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
> index 8bf2fcb..1cdc02c 100644
> --- a/arch/x86/pci/mrst.c
> +++ b/arch/x86/pci/mrst.c
> @@ -247,6 +247,10 @@ static void __devinit pci_fixed_bar_fixup(struct pci_dev *dev)
> u32 size;
> int i;
>
> + /* Must have extended configuration space */
> + if (dev->cfg_size< PCIE_CAP_OFFSET + 4)
> + return;
> +
> /* Fixup the BAR sizes for fixed BAR devices and make them unmoveable */
> offset = fixed_bar_cap(dev->bus, dev->devfn);
> if (!offset || PCI_DEVFN(2, 0) == dev->devfn ||


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 16:11:45

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 08:36 AM, Pan, Jacob jun wrote:
> what is the config size of 10.1?
> ls -l /sys/bus/pci/devices/0000:00:10.1/config
>
> if that is 256, it might be related to this patch.

[root@ice-si-dmz ~]# ls -l /sys/bus/pci/devices/0000:00:10.1/config
-rw-r--r-- 1 root root 4096 2010-07-13 19:14 /sys/bus/pci/devices/0000:00:10.1/config

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 17:06:55

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 08:36 AM, Pan, Jacob jun wrote:
> what is the config size of 10.1?
> ls -l /sys/bus/pci/devices/0000:00:10.1/config
>
> if that is 256, it might be related to this patch.

That patch is already in 2.6.34.y (with slight white-space
change it seems: space before <).

I just posted a patch to lkml that fixes the problem for me,
based on a suggestion by Robert Hancock.

I think this or something similar should to go 2.6.34.y stable
as well.

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 18:19:52

by Jacob Pan

[permalink] [raw]
Subject: RE: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

>-----Original Message-----
>From: Ben Greear [mailto:[email protected]]
>Sent: Wednesday, July 14, 2010 10:07 AM
>To: Pan, Jacob jun
>Cc: Robert Hancock; linux-kernel; [email protected]
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On 07/14/2010 08:36 AM, Pan, Jacob jun wrote:
>> what is the config size of 10.1?
>> ls -l /sys/bus/pci/devices/0000:00:10.1/config
>>
>> if that is 256, it might be related to this patch.
>
>That patch is already in 2.6.34.y (with slight white-space
>change it seems: space before <).
>
>I just posted a patch to lkml that fixes the problem for me,
>based on a suggestion by Robert Hancock.
>
>I think this or something similar should to go 2.6.34.y stable
>as well.
>


I have not seen the patch yet, but there is no guarantee that
capabilities are always laid out in ascending address. So I think
we cannot bail out when
pcie_cap >> 20 <= pos

If that is some bug in the config space, can we fix it with some quirks?

Thanks,

Jacob

2010-07-14 18:22:48

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:19 AM, Pan, Jacob jun wrote:
>> -----Original Message-----
>> From: Ben Greear [mailto:[email protected]]
>> Sent: Wednesday, July 14, 2010 10:07 AM
>> To: Pan, Jacob jun
>> Cc: Robert Hancock; linux-kernel; [email protected]
>> Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>> de08e2c26
>>
>> On 07/14/2010 08:36 AM, Pan, Jacob jun wrote:
>>> what is the config size of 10.1?
>>> ls -l /sys/bus/pci/devices/0000:00:10.1/config
>>>
>>> if that is 256, it might be related to this patch.
>>
>> That patch is already in 2.6.34.y (with slight white-space
>> change it seems: space before<).
>>
>> I just posted a patch to lkml that fixes the problem for me,
>> based on a suggestion by Robert Hancock.
>>
>> I think this or something similar should to go 2.6.34.y stable
>> as well.
>>
>
>
> I have not seen the patch yet, but there is no guarantee that
> capabilities are always laid out in ascending address. So I think
> we cannot bail out when
> pcie_cap>> 20<= pos
>
> If that is some bug in the config space, can we fix it with some quirks?

No idea, but if it's on this one motherboard/device, I imagine it's somewhere
else as well.

Is there at least a maximum number of capabilities that can exist so that
you can limit the loop by that?

Thanks,
Ben

>
> Thanks,
>
> Jacob
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 18:25:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:19 AM, Pan, Jacob jun wrote:
>
> I have not seen the patch yet, but there is no guarantee that
> capabilities are always laid out in ascending address. So I think
> we cannot bail out when
> pcie_cap >> 20 <= pos
>
> If that is some bug in the config space, can we fix it with some quirks?
>

I don't understand where that arithmetic comes from.

Basic config space [0-255] and extended config space [256-4095] are laid
out completely differently, and they have separate capability chains.
In theory one could have extended capabilities in basic config space,
but since the root of that chain is at 0x100, you'd have to have
extended config space available anyway in order to see it.

-hpa

2010-07-14 18:35:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 07:14 AM, Ben Greear wrote:
> On 07/13/2010 08:29 PM, Robert Hancock wrote:
>> On Tue, Jul 13, 2010 at 8:22 PM, Ben Greear<[email protected]> wrote:
>>>> Can you print out bus->number and devfn and look that up in lspci to
>>>> find out which device it's hitting? It looks like there's a device with
>>>> a PCI Express extended capability header that has a extended capability
>>>> ID of 0000h and a next capability offset of 100h, which points to
>>>> itself, causing the infinite loop. I'm guessing that if pcie_cap>> 20
>>>> <= pos then it should give up and break out of the loop, since it means
>>>> that the next capability pointer is invalidly pointing to the same or a
>>>> previous entry..
>>>
>>> Bailing out like that does let it boot.
>>>
>>> As for the bus and devfn: bus: 0 devfn: 129 (decimal)
>>>
>>> I'm not sure what to look for in lspci, but here is the output with -n:
>>
>> That will be device 0x10 function 1, this one:
>>
>> 00:10.1 0600: 8086:25f0 (rev b1)
>>
>> Intel 5000 Series Chipset FSB Registers, apparently.. What does lspci
>> -vv show for that device?
>
> 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
> Subsystem: Super Micro Computer Inc Unknown device 9780
> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Kernel modules: i5000_edac, i5k_amb
>

Could you get the output of lspci -vv -xxxx for this device? I'm
confused why this device would identify as having extended config space...

-hpa

2010-07-14 18:41:29

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:35 AM, H. Peter Anvin wrote:
> On 07/14/2010 07:14 AM, Ben Greear wrote:
>> On 07/13/2010 08:29 PM, Robert Hancock wrote:
>>> On Tue, Jul 13, 2010 at 8:22 PM, Ben Greear<[email protected]> wrote:
>>>>> Can you print out bus->number and devfn and look that up in lspci to
>>>>> find out which device it's hitting? It looks like there's a device with
>>>>> a PCI Express extended capability header that has a extended capability
>>>>> ID of 0000h and a next capability offset of 100h, which points to
>>>>> itself, causing the infinite loop. I'm guessing that if pcie_cap>> 20
>>>>> <= pos then it should give up and break out of the loop, since it means
>>>>> that the next capability pointer is invalidly pointing to the same or a
>>>>> previous entry..
>>>>
>>>> Bailing out like that does let it boot.
>>>>
>>>> As for the bus and devfn: bus: 0 devfn: 129 (decimal)
>>>>
>>>> I'm not sure what to look for in lspci, but here is the output with -n:
>>>
>>> That will be device 0x10 function 1, this one:
>>>
>>> 00:10.1 0600: 8086:25f0 (rev b1)
>>>
>>> Intel 5000 Series Chipset FSB Registers, apparently.. What does lspci
>>> -vv show for that device?
>>
>> 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
>> Subsystem: Super Micro Computer Inc Unknown device 9780
>> Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>> Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx-
>> Kernel modules: i5000_edac, i5k_amb
>>
>
> Could you get the output of lspci -vv -xxxx for this device? I'm
> confused why this device would identify as having extended config space...
>
> -hpa


00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1)
Subsystem: Super Micro Computer Inc Unknown device 9780
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel modules: i5000_edac, i5k_amb
00: 86 80 f0 25 00 00 00 00 b1 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 80 97
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: f0 01 34 46 00 00 00 00 37 23 54 45 42 59 03 00
50: ff ff ff ff c8 00 04 00 16 1a d8 76 08 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 03 02 00 00 00 02 00 00 00 02 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00
a0: 00 00 00 20 00 00 00 00 00 00 00 00 ff 7f ff 0f
b0: ff 7f ff 0f ff 7f f0 0f ff 7f ff 0f 00 00 07 28
c0: 1a 09 04 00 00 08 00 00 ff ff ff 00 0f 00 ff 0f
d0: 00 ff 0f 00 00 f0 ff ff 00 00 c0 0f ff 60 d9 62
e0: 00 00 06 00 60 0c b4 01 00 08 00 00 00 00 00 ff
f0: f0 ff ff 0f ff 00 f0 ff 00 00 00 00 0f 00 3f 00
100: 00 00 00 10 00 00 00 00 00 00 00 10 00 00 00 00
110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff
130: 00 00 0f 0f 00 00 00 00 00 00 00 00 00 00 00 00
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160: 11 11 11 11 00 00 00 00 11 11 11 11 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 82 00 51 00 40 00 00 00 0c 01 08 00
1a0: 00 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff
1b0: 00 00 0f 0f 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 08 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 08 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
4f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
590: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
5f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
610: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
620: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
630: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
640: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
650: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
660: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
6f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
740: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
770: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
790: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
7f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
810: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
850: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
860: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
870: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
890: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
8f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
910: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
920: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
930: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
950: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
9f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
aa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ab0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ac0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ad0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ba0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
be0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ca0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ce0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
cf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
da0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
db0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
dd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
de0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
df0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ea0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
eb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 18:47:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:22 AM, Ben Greear wrote:
>
> Is there at least a maximum number of capabilities that can exist so that
> you can limit the loop by that?
>

Well, 3072 bytes and a minimum size of 4 bytes, so 768. However, a
capability ID of 0000 or FFFF means no capabilities (PCIe 2.01 sec
7.9.1-2), so we should terminate the search on finding one of those
capability ID.

[Also note: bits 21:20 are reserved and need to be masked, per PCIe 2.01
7.9.3.]

The spec seems to imply that capabilities should be sequential, but I
really don't know if that is actually the case in the field.

-hpa

2010-07-14 18:52:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

So I suggest the following changes...

On 07/13/2010 07:19 PM, Jesse Barnes wrote:
>>
>> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
>> {
>> int pos;
>> u32 pcie_cap = 0, cap_data;
>> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
>> pos = PCIE_CAP_OFFSET;
>> while (pos) {

while (pos >= PCIE_CAP_OFFSET) {

>> printk("Before read..\n");
>> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> devfn, pos, 4, &pcie_cap))
>> return 0;
>> printk("pcie_cap: %u", pcie_cap);
>>
- if (pcie_cap == 0xffffffff)
- return 0;

+ if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
+ PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
+ break;

>> printk("Checking vendor..\n");

>> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
>> printk("reading domain_nr\n");
>> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> devfn, pos + 4, 4, &cap_data);
>> printk("cap_data: %u\n", cap_data);
>> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
>> return pos;
>> }
>>
>> pos = pcie_cap >> 20;

pos = (pcie_cap >> 20) & 0xffc;

>> printk("pos after shift: %i\n", pos);
>> }
>>
>> printk("Returning from fixed_bar_cap\n");
>> return 0;
>> }
>>
>>
>
> I thought a related bug was fixed already; the code should be returning
> all zeros for non-existent BAR reads.
>

2010-07-14 18:59:25

by Jesse Barnes

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On Wed, 14 Jul 2010 11:52:44 -0700
"H. Peter Anvin" <[email protected]> wrote:

> So I suggest the following changes...
>
> On 07/13/2010 07:19 PM, Jesse Barnes wrote:
> >>
> >> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
> >> {
> >> int pos;
> >> u32 pcie_cap = 0, cap_data;
> >> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
> >> pos = PCIE_CAP_OFFSET;
> >> while (pos) {
>
> while (pos >= PCIE_CAP_OFFSET) {
>
> >> printk("Before read..\n");
> >> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> >> devfn, pos, 4, &pcie_cap))
> >> return 0;
> >> printk("pcie_cap: %u", pcie_cap);
> >>
> - if (pcie_cap == 0xffffffff)
> - return 0;
>
> + if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
> + PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
> + break;
>
> >> printk("Checking vendor..\n");
>
> >> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
> >> printk("reading domain_nr\n");
> >> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
> >> devfn, pos + 4, 4, &cap_data);
> >> printk("cap_data: %u\n", cap_data);
> >> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
> >> return pos;
> >> }
> >>
> >> pos = pcie_cap >> 20;
>
> pos = (pcie_cap >> 20) & 0xffc;
>
> >> printk("pos after shift: %i\n", pos);
> >> }
> >>
> >> printk("Returning from fixed_bar_cap\n");
> >> return 0;
> >> }
> >>
> >>
> >
> > I thought a related bug was fixed already; the code should be returning
> > all zeros for non-existent BAR reads.

Changes look ok to me, though I'd prefer not hitting this code at all
on non-Moorestown if at all possible.

--
Jesse Barnes, Intel Open Source Technology Center

2010-07-14 19:01:12

by Jacob Pan

[permalink] [raw]
Subject: RE: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26



>-----Original Message-----
>From: Jesse Barnes [mailto:[email protected]]
>Sent: Wednesday, July 14, 2010 11:59 AM
>To: H. Peter Anvin
>Cc: Ben Greear; linux-kernel; Pan, Jacob jun
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On Wed, 14 Jul 2010 11:52:44 -0700
>"H. Peter Anvin" <[email protected]> wrote:
>
>> So I suggest the following changes...
>>
>> On 07/13/2010 07:19 PM, Jesse Barnes wrote:
>> >>
>> >> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
>> >> {
>> >> int pos;
>> >> u32 pcie_cap = 0, cap_data;
>> >> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
>> >> pos = PCIE_CAP_OFFSET;
>> >> while (pos) {
>>
>> while (pos >= PCIE_CAP_OFFSET) {
>>
>> >> printk("Before read..\n");
>> >> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> >> devfn, pos, 4, &pcie_cap))
>> >> return 0;
>> >> printk("pcie_cap: %u", pcie_cap);
>> >>
>> - if (pcie_cap == 0xffffffff)
>> - return 0;
>>
>> + if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
>> + PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
>> + break;
>>
>> >> printk("Checking vendor..\n");
>>
>> >> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
>> >> printk("reading domain_nr\n");
>> >> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>> >> devfn, pos + 4, 4, &cap_data);
>> >> printk("cap_data: %u\n", cap_data);
>> >> if ((cap_data & 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
>> >> return pos;
>> >> }
>> >>
>> >> pos = pcie_cap >> 20;
>>
>> pos = (pcie_cap >> 20) & 0xffc;
>>
>> >> printk("pos after shift: %i\n", pos);
>> >> }
>> >>
>> >> printk("Returning from fixed_bar_cap\n");
>> >> return 0;
>> >> }
>> >>
>> >>
>> >
>> > I thought a related bug was fixed already; the code should be
>returning
>> > all zeros for non-existent BAR reads.
>
>Changes look ok to me, though I'd prefer not hitting this code at all
>on non-Moorestown if at all possible.

Can we use PCI_EXT_CAP_NEXT to replace this line?
pos = (pcie_cap >> 20) & 0xffc

2010-07-14 19:03:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:59 AM, Jesse Barnes wrote:
>
> Changes look ok to me, though I'd prefer not hitting this code at all
> on non-Moorestown if at all possible.
>

I guess that's okay in the short term, but I'm guessing fixed BARs in
some form is eventually going to make it into the general PCI spec.

-hpa

2010-07-14 19:27:52

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 11:52 AM, H. Peter Anvin wrote:
> So I suggest the following changes...
>
> On 07/13/2010 07:19 PM, Jesse Barnes wrote:
>>>
>>> static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
>>> {
>>> int pos;
>>> u32 pcie_cap = 0, cap_data;
>>> printk("fixed_bar_cap, bus: %p devfn: %u\n", bus, devfn);
>>> pos = PCIE_CAP_OFFSET;
>>> while (pos) {
>
> while (pos>= PCIE_CAP_OFFSET) {
>
>>> printk("Before read..\n");
>>> if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>>> devfn, pos, 4,&pcie_cap))
>>> return 0;
>>> printk("pcie_cap: %u", pcie_cap);
>>>
> - if (pcie_cap == 0xffffffff)
> - return 0;
>
> + if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
> + PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
> + break;

This seems to work for me. I'm using this patch against
2.6.34.y:

diff --git a/arch/x86/pci/mrst.c b/arch/x86/pci/mrst.c
index 1cdc02c..ee93fd0 100644
--- a/arch/x86/pci/mrst.c
+++ b/arch/x86/pci/mrst.c
@@ -61,13 +61,14 @@ static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
if (!raw_pci_ext_ops)
return 0;

- while (pos) {
+ while (pos >= PCIE_CAP_OFFSET) {
if (raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
devfn, pos, 4, &pcie_cap))
return 0;

- if (pcie_cap == 0xffffffff)
- return 0;
+ if (PCI_EXT_CAP_ID(pcie_cap) == 0x0000 ||
+ PCI_EXT_CAP_ID(pcie_cap) == 0xffff)
+ break;

if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
@@ -76,7 +77,7 @@ static int fixed_bar_cap(struct pci_bus *bus, unsigned int devfn)
return pos;
}

- pos = pcie_cap >> 20;
+ pos = (pcie_cap >> 20) & 0xffc;
}

return 0;


Thanks,
Ben


>
>>> printk("Checking vendor..\n");
>
>>> if (PCI_EXT_CAP_ID(pcie_cap) == PCI_EXT_CAP_ID_VNDR) {
>>> printk("reading domain_nr\n");
>>> raw_pci_ext_ops->read(pci_domain_nr(bus), bus->number,
>>> devfn, pos + 4, 4,&cap_data);
>>> printk("cap_data: %u\n", cap_data);
>>> if ((cap_data& 0xffff) == PCIE_VNDR_CAP_ID_FIXED_BAR)
>>> return pos;
>>> }
>>>
>>> pos = pcie_cap>> 20;
>
> pos = (pcie_cap>> 20)& 0xffc;
>
>>> printk("pos after shift: %i\n", pos);
>>> }
>>>
>>> printk("Returning from fixed_bar_cap\n");
>>> return 0;
>>> }
>>>
>>>
>>
>> I thought a related bug was fixed already; the code should be returning
>> all zeros for non-existent BAR reads.
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-14 19:55:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 12:01 PM, Pan, Jacob jun wrote:
>
> Can we use PCI_EXT_CAP_NEXT to replace this line?
> pos = (pcie_cap >> 20) & 0xffc
>

Presumably, I haven't checked the definition of that macro, but that
would be the logical semantics.

-hpa

2010-07-15 16:38:09

by Ben Greear

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/14/2010 12:55 PM, H. Peter Anvin wrote:
> On 07/14/2010 12:01 PM, Pan, Jacob jun wrote:
>>
>> Can we use PCI_EXT_CAP_NEXT to replace this line?
>> pos = (pcie_cap>> 20)& 0xffc
>>
>
> Presumably, I haven't checked the definition of that macro, but that
> would be the logical semantics.
>
> -hpa

Are one of you guys going to submit a fix upstream (and hopefully
to stable)? Since you guys actually understand the code, probably
best that you do it....

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2010-07-16 17:33:37

by Jacob Pan

[permalink] [raw]
Subject: RE: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

HPA,
If you have not done it already, I can generate a patch against tip tree
and do some testing on it today.

Thanks,

Jacob

>-----Original Message-----
>From: Ben Greear [mailto:[email protected]]
>Sent: Thursday, July 15, 2010 9:38 AM
>To: H. Peter Anvin
>Cc: Pan, Jacob jun; Jesse Barnes; linux-kernel
>Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected:
>de08e2c26
>
>On 07/14/2010 12:55 PM, H. Peter Anvin wrote:
>> On 07/14/2010 12:01 PM, Pan, Jacob jun wrote:
>>>
>>> Can we use PCI_EXT_CAP_NEXT to replace this line?
>>> pos = (pcie_cap>> 20)& 0xffc
>>>
>>
>> Presumably, I haven't checked the definition of that macro, but that
>> would be the logical semantics.
>>
>> -hpa
>
>Are one of you guys going to submit a fix upstream (and hopefully
>to stable)? Since you guys actually understand the code, probably
>best that you do it....
>
>Thanks,
>Ben
>
>--
>Ben Greear <[email protected]>
>Candela Technologies Inc http://www.candelatech.com

2010-07-16 18:28:39

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

On 07/16/2010 10:33 AM, Pan, Jacob jun wrote:
> HPA,
> If you have not done it already, I can generate a patch against tip tree
> and do some testing on it today.
>
> Thanks,
>
> Jacob
>

Please.

-hpa