Yinghai,
I finally found time to try to get some output using your patch for
resource.c on a kernel that hangs.
Some really good advice came in earlier today: I can use "vga=1" to
get 80x50 mode during the early boot sequence. I used that, and made
some alterations to the changes in your patch to squeeze more info
onto the screen. I also changed KERN_DEBUG to KERN_ERR in your
printk's so that I could decrease the other output by using
"loglevel=4".
While I cannot see the entire set of output from your debug printk's,
I can see the last 45+ lines that appear before the hang. The results
were the same as when I used "hpet=disable": the only difference
between the working 2.6.25 kernel and the hanging 2.6.27 kernel was
that the resource named "0000:00:14.0" (fed00000-fed003ff) switched
from "conflict=0" to "conflict=1".
The output from 'cat /proc/iomem' on a non-hanging kernel included
these two lines:
fed00000-fed003ff : HPET 0
fed00000-fed003ff : 0000:00:14.0
None of this is really new information, since it matches with info
I've already posted. I just wanted to let you know the results of
your patch on a kernel that hangs.
If I find a way to read the info from the 15-20 lines that scroll
away on me, and if I find a difference, I'll post that as well.
DW
On Fri, Aug 22, 2008 at 7:25 PM, David Witbrodt <[email protected]> wrote:
>
> Yinghai,
>
> I finally found time to try to get some output using your patch for
> resource.c on a kernel that hangs.
>
> Some really good advice came in earlier today: I can use "vga=1" to
> get 80x50 mode during the early boot sequence. I used that, and made
> some alterations to the changes in your patch to squeeze more info
> onto the screen. I also changed KERN_DEBUG to KERN_ERR in your
> printk's so that I could decrease the other output by using
> "loglevel=4".
>
> While I cannot see the entire set of output from your debug printk's,
> I can see the last 45+ lines that appear before the hang. The results
> were the same as when I used "hpet=disable": the only difference
> between the working 2.6.25 kernel and the hanging 2.6.27 kernel was
> that the resource named "0000:00:14.0" (fed00000-fed003ff) switched
> from "conflict=0" to "conflict=1".
>
> The output from 'cat /proc/iomem' on a non-hanging kernel included
> these two lines:
>
> fed00000-fed003ff : HPET 0
> fed00000-fed003ff : 0000:00:14.0
that mean is hpet using insert_resource.
pnp 00:0d: mem resource (0xfed00000-0xfed000ff) overlaps 0000:00:14.0
BAR 1 (0xfed00000-0xfed003ff), disabling
request_resource: root: (PCI IO) [0, ffff], new: (0000:00:14.0) [fa00,
fa0f] conflict 0
request_resource: root: (PCI mem) [0, ffffffffffffffff], new:
(0000:00:14.0) [fed00000, fed003ff] conflict 1
pci 0000:00:14.0: BAR 1: can't allocate resource
piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xfa00, revision 0
because (0000:00:14.0) [fed00000, fed003ff] conflicts to (reserved)
[fec00000, ffffffff] from e820.
kernel:
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (Kernel
code) [200000, 56ce41]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (Kernel code) [200000, 56ce41]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (Kernel
data) [56ce42, 6d5f3f]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (Kernel data) [56ce42, 6d5f3f]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (Kernel
bss) [773000, 7b7647]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (Kernel bss) [773000, 7b7647]
E820:
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (System
RAM) [0, 9f3ff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (System RAM) [0, 9f3ff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new:
(reserved) [9f400, 9ffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (reserved) [9f400, 9ffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new:
(reserved) [f0000, fffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (reserved) [f0000, fffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (System
RAM) [100000, 77fdffff]
insert_resource: first: (Kernel code) [200000, 56ce41], new: (System
RAM) [100000, 77fdffff]
insert_resource: direct parent: (PCI mem) [0, ffffffffffffffff], new:
(System RAM) [100000, 77fdffff]
insert_resource: child: (Kernel code) [200000, 56ce41], new:
(System RAM) [100000, 77fdffff]
insert_resource: child: (Kernel data) [56ce42, 6d5f3f], new:
(System RAM) [100000, 77fdffff]
insert_resource: child: (Kernel bss) [773000, 7b7647], new: (System
RAM) [100000, 77fdffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (ACPI
Non-volatile Storage) [77fe0000, 77fe2fff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (ACPI Non-volatile Storage) [77fe0000,
77fe2fff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new: (ACPI
Tables) [77fe3000, 77feffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (ACPI Tables) [77fe3000, 77feffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new:
(reserved) [77ff0000, 77ffffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (reserved) [77ff0000, 77ffffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new:
(reserved) [e0000000, efffffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (reserved) [e0000000, efffffff]
insert_resource: parent: (PCI mem) [0, ffffffffffffffff], new:
(reserved) [fec00000, ffffffff]
insert_resource: good with request direct parent: (PCI mem) [0,
ffffffffffffffff], new: (reserved) [fec00000, ffffffff]
so old kernel lapic 0xfee0000, is insert at first, it will prevent
0xfec, ... 0xfffffff from e820 to be registered,
then (0000:00:14.0) [fed00000, fed003ff] got chance to be
registered..., timer will work...
root cause could be the chipset BAR1
request_resource: root: (PCI mem) [0, ffffffffffffffff], new:
(0000:00:14.0) [fed00000, fed003ff] conflict 1
is not handled properly...
guess when it can not get request_resource, it could clear that
BAR1...may need one quirk to save that.
just like we trust IO APIC addr in BAR for some devices.
YH
please send out after booting with hpet=disable
lspci -tv
lspci -vvxxx
YH