2014-02-24 16:24:07

by Borislav Petkov

[permalink] [raw]
Subject: Info: mapping multiple BARs. Your kernel is fine.

This started happening this morning after booting -rc4+tip, let's
add *everybody* to CC :-)

We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
other goodies on the stack.

...
[ 0.488998] software IO TLB [mem 0xcac30000-0xcec30000] (64MB) mapped at [ffff8800cac30000-ffff8800cec2ffff]
[ 0.489975] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
[ 0.490079] ------------[ cut here ]------------
[ 0.490204] WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x372/0x380()
[ 0.490306] Info: mapping multiple BARs. Your kernel is fine.
[ 0.490371] Modules linked in:
[ 0.490558] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #1
[ 0.490642] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
[ 0.490742] 00000000000000ab ffff880213d01ad8 ffffffff816112e3 0000000000000006
[ 0.491032] ffff880213d01b28 ffff880213d01b18 ffffffff8104e9bc ffff880213d01b08
[ 0.491343] ffffc90000c58000 00000000fed10000 00000000fed10000 0000000000006000
[ 0.491631] Call Trace:
[ 0.493337] [<ffffffff816112e3>] dump_stack+0x4f/0x7c
[ 0.493420] [<ffffffff8104e9bc>] warn_slowpath_common+0x8c/0xc0
[ 0.493503] [<ffffffff8104eaa6>] warn_slowpath_fmt+0x46/0x50
[ 0.493588] [<ffffffff8103f1e2>] __ioremap_caller+0x372/0x380
[ 0.493674] [<ffffffff810211a2>] ? snb_uncore_imc_init_box+0x62/0x90
[ 0.493761] [<ffffffff8103f247>] ioremap_nocache+0x17/0x20
[ 0.493846] [<ffffffff810211a2>] snb_uncore_imc_init_box+0x62/0x90
[ 0.493933] [<ffffffff81022925>] uncore_pci_probe+0xe5/0x1e0
[ 0.494020] [<ffffffff812d487e>] local_pci_probe+0x4e/0xa0
[ 0.494104] [<ffffffff81418a59>] ? get_device+0x19/0x20
[ 0.494213] [<ffffffff812d5cd1>] pci_device_probe+0xe1/0x130
[ 0.494300] [<ffffffff8141d3cb>] driver_probe_device+0x7b/0x240
[ 0.494385] [<ffffffff8141d63b>] __driver_attach+0xab/0xb0
[ 0.494469] [<ffffffff8141d590>] ? driver_probe_device+0x240/0x240
[ 0.494551] [<ffffffff8141b71e>] bus_for_each_dev+0x5e/0x90
[ 0.494634] [<ffffffff8141cede>] driver_attach+0x1e/0x20
[ 0.494718] [<ffffffff8141ca57>] bus_add_driver+0x117/0x230
[ 0.494802] [<ffffffff8141dd34>] driver_register+0x64/0xf0
[ 0.494884] [<ffffffff812d4c14>] __pci_register_driver+0x64/0x70
[ 0.494972] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
[ 0.495056] [<ffffffff81d03312>] intel_uncore_init+0x177/0x41c
[ 0.495155] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
[ 0.495242] [<ffffffff8100029e>] do_one_initcall+0x4e/0x170
[ 0.495326] [<ffffffff81071100>] ? parse_args+0x60/0x360
[ 0.495411] [<ffffffff81cfbfb8>] kernel_init_freeable+0x106/0x19a
[ 0.495497] [<ffffffff81cfb83b>] ? do_early_param+0x86/0x86
[ 0.495582] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
[ 0.495666] [<ffffffff81607efe>] kernel_init+0xe/0xf0
[ 0.495749] [<ffffffff81621f6c>] ret_from_fork+0x7c/0xb0
[ 0.495831] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
[ 0.495921] ---[ end trace 428f365c054d9a01 ]---
[ 0.496196] RAPL PMU detected, hw unit 2^-16 Joules, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
[ 0.498598] futex hash table entries: 1024 (order: 5, 131072 bytes)
[ 0.498833] audit: initializing netlink subsys (disabled)
[ 0.499024] audit: type=2000 audit(1393259866.477:1): initialized
...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


2014-02-24 20:19:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Btw,

I don't know whether the following observation is related or not, but it
so happens that after resume from suspend-to-disk, I see the booting up
of the resume kernel on the console but when it is time for the original
kernel to take over and switch to graphics, the screen remains black but
the machine is responsive over the network.

And this doesn't happen on every resume but only sporadically.

And yep, -rc3 was fine.

On Mon, Feb 24, 2014 at 05:24:00PM +0100, Borislav Petkov wrote:
> This started happening this morning after booting -rc4+tip, let's
> add *everybody* to CC :-)
>
> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
> other goodies on the stack.
>
> ...
> [ 0.488998] software IO TLB [mem 0xcac30000-0xcec30000] (64MB) mapped at [ffff8800cac30000-ffff8800cec2ffff]
> [ 0.489975] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> [ 0.490079] ------------[ cut here ]------------
> [ 0.490204] WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x372/0x380()
> [ 0.490306] Info: mapping multiple BARs. Your kernel is fine.
> [ 0.490371] Modules linked in:
> [ 0.490558] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #1
> [ 0.490642] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
> [ 0.490742] 00000000000000ab ffff880213d01ad8 ffffffff816112e3 0000000000000006
> [ 0.491032] ffff880213d01b28 ffff880213d01b18 ffffffff8104e9bc ffff880213d01b08
> [ 0.491343] ffffc90000c58000 00000000fed10000 00000000fed10000 0000000000006000
> [ 0.491631] Call Trace:
> [ 0.493337] [<ffffffff816112e3>] dump_stack+0x4f/0x7c
> [ 0.493420] [<ffffffff8104e9bc>] warn_slowpath_common+0x8c/0xc0
> [ 0.493503] [<ffffffff8104eaa6>] warn_slowpath_fmt+0x46/0x50
> [ 0.493588] [<ffffffff8103f1e2>] __ioremap_caller+0x372/0x380
> [ 0.493674] [<ffffffff810211a2>] ? snb_uncore_imc_init_box+0x62/0x90
> [ 0.493761] [<ffffffff8103f247>] ioremap_nocache+0x17/0x20
> [ 0.493846] [<ffffffff810211a2>] snb_uncore_imc_init_box+0x62/0x90
> [ 0.493933] [<ffffffff81022925>] uncore_pci_probe+0xe5/0x1e0
> [ 0.494020] [<ffffffff812d487e>] local_pci_probe+0x4e/0xa0
> [ 0.494104] [<ffffffff81418a59>] ? get_device+0x19/0x20
> [ 0.494213] [<ffffffff812d5cd1>] pci_device_probe+0xe1/0x130
> [ 0.494300] [<ffffffff8141d3cb>] driver_probe_device+0x7b/0x240
> [ 0.494385] [<ffffffff8141d63b>] __driver_attach+0xab/0xb0
> [ 0.494469] [<ffffffff8141d590>] ? driver_probe_device+0x240/0x240
> [ 0.494551] [<ffffffff8141b71e>] bus_for_each_dev+0x5e/0x90
> [ 0.494634] [<ffffffff8141cede>] driver_attach+0x1e/0x20
> [ 0.494718] [<ffffffff8141ca57>] bus_add_driver+0x117/0x230
> [ 0.494802] [<ffffffff8141dd34>] driver_register+0x64/0xf0
> [ 0.494884] [<ffffffff812d4c14>] __pci_register_driver+0x64/0x70
> [ 0.494972] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.495056] [<ffffffff81d03312>] intel_uncore_init+0x177/0x41c
> [ 0.495155] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.495242] [<ffffffff8100029e>] do_one_initcall+0x4e/0x170
> [ 0.495326] [<ffffffff81071100>] ? parse_args+0x60/0x360
> [ 0.495411] [<ffffffff81cfbfb8>] kernel_init_freeable+0x106/0x19a
> [ 0.495497] [<ffffffff81cfb83b>] ? do_early_param+0x86/0x86
> [ 0.495582] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
> [ 0.495666] [<ffffffff81607efe>] kernel_init+0xe/0xf0
> [ 0.495749] [<ffffffff81621f6c>] ret_from_fork+0x7c/0xb0
> [ 0.495831] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
> [ 0.495921] ---[ end trace 428f365c054d9a01 ]---
> [ 0.496196] RAPL PMU detected, hw unit 2^-16 Joules, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
> [ 0.498598] futex hash table entries: 1024 (order: 5, 131072 bytes)
> [ 0.498833] audit: initializing netlink subsys (disabled)
> [ 0.499024] audit: type=2000 audit(1393259866.477:1): initialized

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-25 15:49:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On 02/24/2014 12:19 PM, Borislav Petkov wrote:
> Btw,
>
> I don't know whether the following observation is related or not, but it
> so happens that after resume from suspend-to-disk, I see the booting up
> of the resume kernel on the console but when it is time for the original
> kernel to take over and switch to graphics, the screen remains black but
> the machine is responsive over the network.
>
> And this doesn't happen on every resume but only sporadically.
>
> And yep, -rc3 was fine.
>
> On Mon, Feb 24, 2014 at 05:24:00PM +0100, Borislav Petkov wrote:
>> This started happening this morning after booting -rc4+tip, let's
>> add *everybody* to CC :-)
>>
>> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
>> other goodies on the stack.
>>

snb_uncore_imc_init_box() is introduced new in tip:perf/core, and is a
relatively recent commit (b9e1ab6d4c0582cad97699285a6b3cf992251b00), so
I suspect that that wasn't in whatever -rc3 mix you were testing.

I am wondering if backing/disabling out that support (perhaps by
removing the relevant PCI ID) fixes the problem?

-hpa

2014-02-25 16:14:04

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Hi,

I am trying to understand your test case.
Were you actually measure uncore_imc events at the time you suspended?

I tried on my IvyBridge Lenovo and it works fine with 3.14-rc4+ (tip.git).
I used: echo -n disk >/sys/power/state




On Tue, Feb 25, 2014 at 4:48 PM, H. Peter Anvin <[email protected]> wrote:
> On 02/24/2014 12:19 PM, Borislav Petkov wrote:
>> Btw,
>>
>> I don't know whether the following observation is related or not, but it
>> so happens that after resume from suspend-to-disk, I see the booting up
>> of the resume kernel on the console but when it is time for the original
>> kernel to take over and switch to graphics, the screen remains black but
>> the machine is responsive over the network.
>>
>> And this doesn't happen on every resume but only sporadically.
>>
>> And yep, -rc3 was fine.
>>
>> On Mon, Feb 24, 2014 at 05:24:00PM +0100, Borislav Petkov wrote:
>>> This started happening this morning after booting -rc4+tip, let's
>>> add *everybody* to CC :-)
>>>
>>> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
>>> other goodies on the stack.
>>>
>
> snb_uncore_imc_init_box() is introduced new in tip:perf/core, and is a
> relatively recent commit (b9e1ab6d4c0582cad97699285a6b3cf992251b00), so
> I suspect that that wasn't in whatever -rc3 mix you were testing.
>
> I am wondering if backing/disabling out that support (perhaps by
> removing the relevant PCI ID) fixes the problem?
>
> -hpa
>

2014-02-25 16:30:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Tue, Feb 25, 2014 at 05:14:01PM +0100, Stephane Eranian wrote:
> I am trying to understand your test case.
> Were you actually measure uncore_imc events at the time you suspended?

No test case, just the machine booting; look at the printk timestamps.

> I tried on my IvyBridge Lenovo and it works fine with 3.14-rc4+
> (tip.git). I used: echo -n disk >/sys/power/state

That's an x230 too, right? What I do is, I take linus/master, merge
tip/master, Matt's efi/next tree and my edac/for-next tree into it and
then boot that.

I don't think that the edac and efi trees interfere though. I'll do a
fresh merge of only current tip/master into linus/master to test hpa's
suggestion in the other mail.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-25 16:33:16

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Tue, Feb 25, 2014 at 5:30 PM, Borislav Petkov <[email protected]> wrote:
> On Tue, Feb 25, 2014 at 05:14:01PM +0100, Stephane Eranian wrote:
>> I am trying to understand your test case.
>> Were you actually measure uncore_imc events at the time you suspended?
>
> No test case, just the machine booting; look at the printk timestamps.
>
>> I tried on my IvyBridge Lenovo and it works fine with 3.14-rc4+
>> (tip.git). I used: echo -n disk >/sys/power/state
>
> That's an x230 too, right? What I do is, I take linus/master, merge
> tip/master, Matt's efi/next tree and my edac/for-next tree into it and
> then boot that.

No, it's a T430s. What happens if you boot vanilla tip.git?

>
> I don't think that the edac and efi trees interfere though. I'll do a
> fresh merge of only current tip/master into linus/master to test hpa's
> suggestion in the other mail.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-02-25 17:39:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Tue, Feb 25, 2014 at 05:33:13PM +0100, Stephane Eranian wrote:
> No, it's a T430s. What happens if you boot vanilla tip.git?

linus/master + tip/master -> fails
tip/master -> fails

All trees are from today, like an hour ago or so.

Doing what hpa suggested:

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index b262c6124cf3..ec217d2d28dd 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -3871,6 +3871,7 @@ static int __init uncore_pci_init(void)
pci_uncores = snb_pci_uncores;
uncore_pci_driver = &snb_uncore_pci_driver;
break;
+#if 0
case 58: /* Ivy Bridge */
ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_IVB_IMC);
if (ret)
@@ -3878,6 +3879,7 @@ static int __init uncore_pci_init(void)
pci_uncores = snb_pci_uncores;
uncore_pci_driver = &ivb_uncore_pci_driver;
break;
+#endif
case 60: /* Haswell */
case 69: /* Haswell Celeron */
ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_HSW_IMC);

for model 58, IVB, works around the issue.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-25 18:54:55

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Tue, Feb 25, 2014 at 6:39 PM, Borislav Petkov <[email protected]> wrote:
> On Tue, Feb 25, 2014 at 05:33:13PM +0100, Stephane Eranian wrote:
>> No, it's a T430s. What happens if you boot vanilla tip.git?
>
> linus/master + tip/master -> fails
> tip/master -> fails
>
> All trees are from today, like an hour ago or so.
>
> Doing what hpa suggested:
>
I am on tip.git at cfbf8d4 Linux 3.14-rc4
and I don't see the problem (using Ubuntu Saucy).

Given what you commented out, it seems like you're saying
something goes wrong with pci_get_device().
Am I missing some pm callbacks?

The uncore IMC is not used internally.

> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index b262c6124cf3..ec217d2d28dd 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -3871,6 +3871,7 @@ static int __init uncore_pci_init(void)
> pci_uncores = snb_pci_uncores;
> uncore_pci_driver = &snb_uncore_pci_driver;
> break;
> +#if 0
> case 58: /* Ivy Bridge */
> ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_IVB_IMC);
> if (ret)
> @@ -3878,6 +3879,7 @@ static int __init uncore_pci_init(void)
> pci_uncores = snb_pci_uncores;
> uncore_pci_driver = &ivb_uncore_pci_driver;
> break;
> +#endif
> case 60: /* Haswell */
> case 69: /* Haswell Celeron */
> ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_HSW_IMC);
>
> for model 58, IVB, works around the issue.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-02-25 22:10:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Tue, Feb 25, 2014 at 07:54:53PM +0100, Stephane Eranian wrote:

> I am on tip.git at cfbf8d4 Linux 3.14-rc4.
> and I don't see the problem (using Ubuntu Saucy).

Also IVB, model 58?

> Given what you commented out, it seems like you're saying
> something goes wrong with pci_get_device().

Probably. I'll add some debug printk's tomorrow to shed some more light
on the matter.

> Am I missing some pm callbacks?

Dunno. What do you mean by "pm callbacks" exactly? I don't know that
code so I have to ask.

> The uncore IMC is not used internally.

By IMC I'm assuming this PIC dev:

#define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154

?

And "internally" means by BIOS or something behind the curtains like
SMM...?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-26 06:57:00

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Hi,

On Tue, Feb 25, 2014 at 11:10 PM, Borislav Petkov <[email protected]> wrote:
> On Tue, Feb 25, 2014 at 07:54:53PM +0100, Stephane Eranian wrote:
>
>> I am on tip.git at cfbf8d4 Linux 3.14-rc4.
>> and I don't see the problem (using Ubuntu Saucy).
>
> Also IVB, model 58?
>
Yes.

>> Given what you commented out, it seems like you're saying
>> something goes wrong with pci_get_device().
>
> Probably. I'll add some debug printk's tomorrow to shed some more light
> on the matter.
>
>> Am I missing some pm callbacks?
>
> Dunno. What do you mean by "pm callbacks" exactly? I don't know that
> code so I have to ask.
>
power management callbacks.

>> The uncore IMC is not used internally.
>
> By IMC I'm assuming this PIC dev:
>
> #define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154
>
> ?
>
Yes. Needs to point to the DRAM controller.

> And "internally" means by BIOS or something behind the curtains like
> SMM...?
>
I meant by the kernel.

2014-02-26 09:29:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Feb 26, 2014 at 07:56:58AM +0100, Stephane Eranian wrote:
> > Also IVB, model 58?
> >
> Yes.

Right, so it must be chipset-specific.

> > Dunno. What do you mean by "pm callbacks" exactly? I don't know that
> > code so I have to ask.
> >
> power management callbacks.

Ok, just as I thought. But why would they be relevant if this happens
very early during boot?

> > #define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154
> Yes. Needs to point to the DRAM controller.

It seems I have it :-)

$ lspci -xxx -s 00.0
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00: 86 80 54 01 06 00 90 20 09 00 00 06 00 00 00 00
^^^^^

10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 fa 21
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00
40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00
50: 11 02 00 00 11 00 00 00 07 00 90 df 01 00 00 db
60: 05 00 00 f8 00 00 00 00 01 80 d1 fe 00 00 00 00
70: 00 00 00 fe 01 00 00 00 00 0c 00 fe 7f 00 00 00
80: 10 11 11 11 11 11 11 00 1a 00 00 00 00 00 00 00
90: 01 00 00 fe 01 00 00 00 01 00 50 1e 02 00 00 00
a0: 01 00 00 00 02 00 00 00 01 00 60 1e 02 00 00 00
b0: 01 00 a0 db 01 00 80 db 01 00 00 db 01 00 a0 df
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 09 00 0c 01 9b 61 00 e2 d0 00 e8 76 00 00 00 00
f0: 00 00 00 01 00 00 00 00 c8 0f 09 00 00 00 00 00

Anyway, here's some more debugging output and some more staring:

So we're correctly getting 0x154 and then snb_uncore_imc_init_box()
tries to ioremap 0xfed10000 but this fails the resource map check with:

[ 0.485356] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01

and the pnp 00:01 device already partially occupies that range (from
/proc/iomem):

fed10000-fed13fff : pnp 00:01

Oh, and snb_uncore_imc_init_box() gets that address from
SNB_UNCORE_PCI_IMC_BAR_OFFSET and SNB_UNCORE_PCI_IMC_BAR_OFFSET+4 and
they start at offset 0x48 in the PCI config space above, i.e.

40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00
^^^^^^^^^^^^^^^^^^^^^^^

which is 0x000000fed10001 (the 0x1 bit disappears after addr &= ~(PAGE_SIZE - 1);)

So I'm guessing it is time to talk to platform guys and ask them why
they're putting SNB_UNCORE_PCI_IMC_BAR_OFFSET{,+4} in an overlapping
range with pnp 00:01.

[ 0.484023] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 0.484108] software IO TLB [mem 0xcac30000-0xcec30000] (64MB) mapped at [ffff8800cac30000-ffff8800cec2ffff]
[ 0.484971] DBG: will get device: 0x8086:154
[ 0.485054] DBG: Got device, bus: 0x0
[ 0.485254] DBG: ioremapping addr: 0xfed10000
[ 0.485356] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
[ 0.485460] ------------[ cut here ]------------
[ 0.485544] WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x372/0x380()
[ 0.485643] Info: mapping multiple BARs. Your kernel is fine.
[ 0.485709] Modules linked in:
[ 0.485935] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #6
[ 0.486019] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
[ 0.486117] 00000000000000ab ffff880213d01ad8 ffffffff81611339 0000000000000006
[ 0.486411] ffff880213d01b28 ffff880213d01b18 ffffffff8104e9cc ffff880213d01b08
[ 0.488308] ffffc90000c58000 00000000fed10000 00000000fed10000 0000000000006000
[ 0.488595] Call Trace:
[ 0.488671] [<ffffffff81611339>] dump_stack+0x4f/0x7c
[ 0.488754] [<ffffffff8104e9cc>] warn_slowpath_common+0x8c/0xc0
[ 0.488877] [<ffffffff8104eab6>] warn_slowpath_fmt+0x46/0x50
[ 0.488966] [<ffffffff8103f1f2>] __ioremap_caller+0x372/0x380
[ 0.489052] [<ffffffff810211b6>] ? snb_uncore_imc_init_box+0x76/0xa0
[ 0.489137] [<ffffffff8103f257>] ioremap_nocache+0x17/0x20
[ 0.489221] [<ffffffff810211b6>] snb_uncore_imc_init_box+0x76/0xa0
[ 0.489307] [<ffffffff81022935>] uncore_pci_probe+0xe5/0x1e0
[ 0.489391] [<ffffffff812d488e>] local_pci_probe+0x4e/0xa0
[ 0.489474] [<ffffffff81418a69>] ? get_device+0x19/0x20
[ 0.489558] [<ffffffff812d5ce1>] pci_device_probe+0xe1/0x130
[ 0.489642] [<ffffffff8141d3db>] driver_probe_device+0x7b/0x240
[ 0.489726] [<ffffffff8141d64b>] __driver_attach+0xab/0xb0
[ 0.489834] [<ffffffff8141d5a0>] ? driver_probe_device+0x240/0x240
[ 0.489920] [<ffffffff8141b72e>] bus_for_each_dev+0x5e/0x90
[ 0.490003] [<ffffffff8141ceee>] driver_attach+0x1e/0x20
[ 0.490086] [<ffffffff8141ca67>] bus_add_driver+0x117/0x230
[ 0.490170] [<ffffffff8141dd44>] driver_register+0x64/0xf0
[ 0.490251] [<ffffffff812d4c24>] __pci_register_driver+0x64/0x70
[ 0.490337] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
[ 0.490421] [<ffffffff81d03331>] intel_uncore_init+0x196/0x462
[ 0.490504] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
[ 0.490591] [<ffffffff8100029e>] do_one_initcall+0x4e/0x170
[ 0.490676] [<ffffffff81071100>] ? parse_args+0x50/0x360
[ 0.490762] [<ffffffff81cfbfb8>] kernel_init_freeable+0x106/0x19a
[ 0.490863] [<ffffffff81cfb83b>] ? do_early_param+0x86/0x86
[ 0.490948] [<ffffffff81607f00>] ? rest_init+0xd0/0xd0
[ 0.491032] [<ffffffff81607f0e>] kernel_init+0xe/0xf0
[ 0.491116] [<ffffffff81621fac>] ret_from_fork+0x7c/0xb0
[ 0.491199] [<ffffffff81607f00>] ? rest_init+0xd0/0xd0
[ 0.491289] ---[ end trace b31a7f760e34b24a ]---
[ 0.491547] RAPL PMU detected, hw unit 2^-16 Joules, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
[ 0.493962] futex hash table entries: 1024 (order: 5, 131072 bytes)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-26 09:47:29

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Hi,

Ok, so I am getting the same error message as you.
I checked my syslog now.

I have my uncore_imc addr=0xfed10000 (after masking)

And I also have pnp 00:01 overlapping the imc range completely.

What pnp device does it really represent? the DRAM controller?

So I think my laptop behaves like yours.

On Wed, Feb 26, 2014 at 10:29 AM, Borislav Petkov <[email protected]> wrote:
> On Wed, Feb 26, 2014 at 07:56:58AM +0100, Stephane Eranian wrote:
>> > Also IVB, model 58?
>> >
>> Yes.
>
> Right, so it must be chipset-specific.
>
>> > Dunno. What do you mean by "pm callbacks" exactly? I don't know that
>> > code so I have to ask.
>> >
>> power management callbacks.
>
> Ok, just as I thought. But why would they be relevant if this happens
> very early during boot?
>
>> > #define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154
>> Yes. Needs to point to the DRAM controller.
>
> It seems I have it :-)
>
> $ lspci -xxx -s 00.0
> 00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
> 00: 86 80 54 01 06 00 90 20 09 00 00 06 00 00 00 00
> ^^^^^
>
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 fa 21
> 30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00
> 40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00
> 50: 11 02 00 00 11 00 00 00 07 00 90 df 01 00 00 db
> 60: 05 00 00 f8 00 00 00 00 01 80 d1 fe 00 00 00 00
> 70: 00 00 00 fe 01 00 00 00 00 0c 00 fe 7f 00 00 00
> 80: 10 11 11 11 11 11 11 00 1a 00 00 00 00 00 00 00
> 90: 01 00 00 fe 01 00 00 00 01 00 50 1e 02 00 00 00
> a0: 01 00 00 00 02 00 00 00 01 00 60 1e 02 00 00 00
> b0: 01 00 a0 db 01 00 80 db 01 00 00 db 01 00 a0 df
> c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 09 00 0c 01 9b 61 00 e2 d0 00 e8 76 00 00 00 00
> f0: 00 00 00 01 00 00 00 00 c8 0f 09 00 00 00 00 00
>
> Anyway, here's some more debugging output and some more staring:
>
> So we're correctly getting 0x154 and then snb_uncore_imc_init_box()
> tries to ioremap 0xfed10000 but this fails the resource map check with:
>
> [ 0.485356] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
>
> and the pnp 00:01 device already partially occupies that range (from
> /proc/iomem):
>
> fed10000-fed13fff : pnp 00:01
>
> Oh, and snb_uncore_imc_init_box() gets that address from
> SNB_UNCORE_PCI_IMC_BAR_OFFSET and SNB_UNCORE_PCI_IMC_BAR_OFFSET+4 and
> they start at offset 0x48 in the PCI config space above, i.e.
>
> 40: 01 90 d1 fe 00 00 00 00 01 00 d1 fe 00 00 00 00
> ^^^^^^^^^^^^^^^^^^^^^^^
>
> which is 0x000000fed10001 (the 0x1 bit disappears after addr &= ~(PAGE_SIZE - 1);)
>
> So I'm guessing it is time to talk to platform guys and ask them why
> they're putting SNB_UNCORE_PCI_IMC_BAR_OFFSET{,+4} in an overlapping
> range with pnp 00:01.
>
> [ 0.484023] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [ 0.484108] software IO TLB [mem 0xcac30000-0xcec30000] (64MB) mapped at [ffff8800cac30000-ffff8800cec2ffff]
> [ 0.484971] DBG: will get device: 0x8086:154
> [ 0.485054] DBG: Got device, bus: 0x0
> [ 0.485254] DBG: ioremapping addr: 0xfed10000
> [ 0.485356] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> [ 0.485460] ------------[ cut here ]------------
> [ 0.485544] WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x372/0x380()
> [ 0.485643] Info: mapping multiple BARs. Your kernel is fine.
> [ 0.485709] Modules linked in:
> [ 0.485935] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #6
> [ 0.486019] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
> [ 0.486117] 00000000000000ab ffff880213d01ad8 ffffffff81611339 0000000000000006
> [ 0.486411] ffff880213d01b28 ffff880213d01b18 ffffffff8104e9cc ffff880213d01b08
> [ 0.488308] ffffc90000c58000 00000000fed10000 00000000fed10000 0000000000006000
> [ 0.488595] Call Trace:
> [ 0.488671] [<ffffffff81611339>] dump_stack+0x4f/0x7c
> [ 0.488754] [<ffffffff8104e9cc>] warn_slowpath_common+0x8c/0xc0
> [ 0.488877] [<ffffffff8104eab6>] warn_slowpath_fmt+0x46/0x50
> [ 0.488966] [<ffffffff8103f1f2>] __ioremap_caller+0x372/0x380
> [ 0.489052] [<ffffffff810211b6>] ? snb_uncore_imc_init_box+0x76/0xa0
> [ 0.489137] [<ffffffff8103f257>] ioremap_nocache+0x17/0x20
> [ 0.489221] [<ffffffff810211b6>] snb_uncore_imc_init_box+0x76/0xa0
> [ 0.489307] [<ffffffff81022935>] uncore_pci_probe+0xe5/0x1e0
> [ 0.489391] [<ffffffff812d488e>] local_pci_probe+0x4e/0xa0
> [ 0.489474] [<ffffffff81418a69>] ? get_device+0x19/0x20
> [ 0.489558] [<ffffffff812d5ce1>] pci_device_probe+0xe1/0x130
> [ 0.489642] [<ffffffff8141d3db>] driver_probe_device+0x7b/0x240
> [ 0.489726] [<ffffffff8141d64b>] __driver_attach+0xab/0xb0
> [ 0.489834] [<ffffffff8141d5a0>] ? driver_probe_device+0x240/0x240
> [ 0.489920] [<ffffffff8141b72e>] bus_for_each_dev+0x5e/0x90
> [ 0.490003] [<ffffffff8141ceee>] driver_attach+0x1e/0x20
> [ 0.490086] [<ffffffff8141ca67>] bus_add_driver+0x117/0x230
> [ 0.490170] [<ffffffff8141dd44>] driver_register+0x64/0xf0
> [ 0.490251] [<ffffffff812d4c24>] __pci_register_driver+0x64/0x70
> [ 0.490337] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.490421] [<ffffffff81d03331>] intel_uncore_init+0x196/0x462
> [ 0.490504] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.490591] [<ffffffff8100029e>] do_one_initcall+0x4e/0x170
> [ 0.490676] [<ffffffff81071100>] ? parse_args+0x50/0x360
> [ 0.490762] [<ffffffff81cfbfb8>] kernel_init_freeable+0x106/0x19a
> [ 0.490863] [<ffffffff81cfb83b>] ? do_early_param+0x86/0x86
> [ 0.490948] [<ffffffff81607f00>] ? rest_init+0xd0/0xd0
> [ 0.491032] [<ffffffff81607f0e>] kernel_init+0xe/0xf0
> [ 0.491116] [<ffffffff81621fac>] ret_from_fork+0x7c/0xb0
> [ 0.491199] [<ffffffff81607f00>] ? rest_init+0xd0/0xd0
> [ 0.491289] ---[ end trace b31a7f760e34b24a ]---
> [ 0.491547] RAPL PMU detected, hw unit 2^-16 Joules, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
> [ 0.493962] futex hash table entries: 1024 (order: 5, 131072 bytes)
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-02-26 10:00:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Can you please, pretty please, not top-post...

On Wed, Feb 26, 2014 at 10:47:05AM +0100, Stephane Eranian wrote:
> Hi,
>
> Ok, so I am getting the same error message as you.
> I checked my syslog now.
>
> I have my uncore_imc addr=0xfed10000 (after masking)
>
> And I also have pnp 00:01 overlapping the imc range completely.
>
> What pnp device does it really represent? the DRAM controller?
>
> So I think my laptop behaves like yours.

grep -Er . /sys/devices/pnp0/00\:01/* 2>/dev/null
/sys/devices/pnp0/00:01/firmware_node/hid:PNP0C02
...

so this PNP0C02 is

[ 0.363943] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)

@Rafael, can you please make sense of this whole ACPI gunk?

We have a resource conflict with pnp 00:01, analysis here:
http://lkml.kernel.org/r/[email protected]

This is the rest of the 00:01 info from sysfs:

/sys/devices/pnp0/00:01/firmware_node/uid:0
/sys/devices/pnp0/00:01/firmware_node/path:\_SB_.PCI0.LPC_.SIO_
/sys/devices/pnp0/00:01/firmware_node/power/control:auto
/sys/devices/pnp0/00:01/firmware_node/power/runtime_active_time:0
/sys/devices/pnp0/00:01/firmware_node/power/runtime_status:unsupported
/sys/devices/pnp0/00:01/firmware_node/power/runtime_suspended_time:0
/sys/devices/pnp0/00:01/firmware_node/modalias:acpi:PNP0C02:
/sys/devices/pnp0/00:01/firmware_node/uevent:MODALIAS=acpi:PNP0C02:
/sys/devices/pnp0/00:01/id:PNP0c02
/sys/devices/pnp0/00:01/power/control:auto
/sys/devices/pnp0/00:01/power/runtime_active_time:0
/sys/devices/pnp0/00:01/power/runtime_status:unsupported
/sys/devices/pnp0/00:01/power/runtime_suspended_time:0
/sys/devices/pnp0/00:01/resources:state = active
/sys/devices/pnp0/00:01/resources:io 0x10-0x1f
/sys/devices/pnp0/00:01/resources:io 0x90-0x9f
/sys/devices/pnp0/00:01/resources:io 0x24-0x25
/sys/devices/pnp0/00:01/resources:io 0x28-0x29
/sys/devices/pnp0/00:01/resources:io 0x2c-0x2d
/sys/devices/pnp0/00:01/resources:io 0x30-0x31
/sys/devices/pnp0/00:01/resources:io 0x34-0x35
/sys/devices/pnp0/00:01/resources:io 0x38-0x39
/sys/devices/pnp0/00:01/resources:io 0x3c-0x3d
/sys/devices/pnp0/00:01/resources:io 0xa4-0xa5
/sys/devices/pnp0/00:01/resources:io 0xa8-0xa9
/sys/devices/pnp0/00:01/resources:io 0xac-0xad
/sys/devices/pnp0/00:01/resources:io 0xb0-0xb5
/sys/devices/pnp0/00:01/resources:io 0xb8-0xb9
/sys/devices/pnp0/00:01/resources:io 0xbc-0xbd
/sys/devices/pnp0/00:01/resources:io 0x50-0x53
/sys/devices/pnp0/00:01/resources:io 0x72-0x77
/sys/devices/pnp0/00:01/resources:io 0x400-0x47f
/sys/devices/pnp0/00:01/resources:io 0x500-0x57f
/sys/devices/pnp0/00:01/resources:io 0x800-0x80f
/sys/devices/pnp0/00:01/resources:io 0x15e0-0x15ef
/sys/devices/pnp0/00:01/resources:io 0x1600-0x167f
/sys/devices/pnp0/00:01/resources:mem 0xf8000000-0xfbffffff
/sys/devices/pnp0/00:01/resources:mem 0xfffff000-0xffffffff
/sys/devices/pnp0/00:01/resources:mem 0xfed1c000-0xfed1ffff
/sys/devices/pnp0/00:01/resources:mem 0xfed10000-0xfed13fff
/sys/devices/pnp0/00:01/resources:mem 0xfed18000-0xfed18fff
/sys/devices/pnp0/00:01/resources:mem 0xfed19000-0xfed19fff
/sys/devices/pnp0/00:01/resources:mem 0xfed45000-0xfed4bfff
/sys/devices/pnp0/00:01/resources:mem 0xfed40000-0xfed44fff
/sys/devices/pnp0/00:01/subsystem/drivers_autoprobe:1
/sys/devices/pnp0/00:01/uevent:DRIVER=system

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-26 13:42:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Monday, February 24, 2014 05:24:00 PM Borislav Petkov wrote:
> This started happening this morning after booting -rc4+tip, let's
> add *everybody* to CC :-)

What about -rc4 without tip?

> We have intel_uncore_init, snb_uncore_imc_init_box, uncore_pci_probe and
> other goodies on the stack.
>
> ...
> [ 0.488998] software IO TLB [mem 0xcac30000-0xcec30000] (64MB) mapped at [ffff8800cac30000-ffff8800cec2ffff]
> [ 0.489975] resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> [ 0.490079] ------------[ cut here ]------------
> [ 0.490204] WARNING: CPU: 2 PID: 1 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x372/0x380()
> [ 0.490306] Info: mapping multiple BARs. Your kernel is fine.
> [ 0.490371] Modules linked in:
> [ 0.490558] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #1
> [ 0.490642] Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
> [ 0.490742] 00000000000000ab ffff880213d01ad8 ffffffff816112e3 0000000000000006
> [ 0.491032] ffff880213d01b28 ffff880213d01b18 ffffffff8104e9bc ffff880213d01b08
> [ 0.491343] ffffc90000c58000 00000000fed10000 00000000fed10000 0000000000006000
> [ 0.491631] Call Trace:
> [ 0.493337] [<ffffffff816112e3>] dump_stack+0x4f/0x7c
> [ 0.493420] [<ffffffff8104e9bc>] warn_slowpath_common+0x8c/0xc0
> [ 0.493503] [<ffffffff8104eaa6>] warn_slowpath_fmt+0x46/0x50
> [ 0.493588] [<ffffffff8103f1e2>] __ioremap_caller+0x372/0x380
> [ 0.493674] [<ffffffff810211a2>] ? snb_uncore_imc_init_box+0x62/0x90
> [ 0.493761] [<ffffffff8103f247>] ioremap_nocache+0x17/0x20
> [ 0.493846] [<ffffffff810211a2>] snb_uncore_imc_init_box+0x62/0x90
> [ 0.493933] [<ffffffff81022925>] uncore_pci_probe+0xe5/0x1e0
> [ 0.494020] [<ffffffff812d487e>] local_pci_probe+0x4e/0xa0
> [ 0.494104] [<ffffffff81418a59>] ? get_device+0x19/0x20
> [ 0.494213] [<ffffffff812d5cd1>] pci_device_probe+0xe1/0x130
> [ 0.494300] [<ffffffff8141d3cb>] driver_probe_device+0x7b/0x240
> [ 0.494385] [<ffffffff8141d63b>] __driver_attach+0xab/0xb0
> [ 0.494469] [<ffffffff8141d590>] ? driver_probe_device+0x240/0x240
> [ 0.494551] [<ffffffff8141b71e>] bus_for_each_dev+0x5e/0x90
> [ 0.494634] [<ffffffff8141cede>] driver_attach+0x1e/0x20
> [ 0.494718] [<ffffffff8141ca57>] bus_add_driver+0x117/0x230
> [ 0.494802] [<ffffffff8141dd34>] driver_register+0x64/0xf0
> [ 0.494884] [<ffffffff812d4c14>] __pci_register_driver+0x64/0x70
> [ 0.494972] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.495056] [<ffffffff81d03312>] intel_uncore_init+0x177/0x41c
> [ 0.495155] [<ffffffff81d0319b>] ? uncore_types_init+0x19c/0x19c
> [ 0.495242] [<ffffffff8100029e>] do_one_initcall+0x4e/0x170
> [ 0.495326] [<ffffffff81071100>] ? parse_args+0x60/0x360
> [ 0.495411] [<ffffffff81cfbfb8>] kernel_init_freeable+0x106/0x19a
> [ 0.495497] [<ffffffff81cfb83b>] ? do_early_param+0x86/0x86
> [ 0.495582] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
> [ 0.495666] [<ffffffff81607efe>] kernel_init+0xe/0xf0
> [ 0.495749] [<ffffffff81621f6c>] ret_from_fork+0x7c/0xb0
> [ 0.495831] [<ffffffff81607ef0>] ? rest_init+0xd0/0xd0
> [ 0.495921] ---[ end trace 428f365c054d9a01 ]---
> [ 0.496196] RAPL PMU detected, hw unit 2^-16 Joules, API unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
> [ 0.498598] futex hash table entries: 1024 (order: 5, 131072 bytes)
> [ 0.498833] audit: initializing netlink subsys (disabled)
> [ 0.499024] audit: type=2000 audit(1393259866.477:1): initialized
> ...
>
>

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-02-26 13:50:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Feb 26, 2014 at 02:57:16PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 24, 2014 05:24:00 PM Borislav Petkov wrote:
> > This started happening this morning after booting -rc4+tip, let's
> > add *everybody* to CC :-)
>
> What about -rc4 without tip?

The driver causing this is new and lives in -tip.

2014-02-26 13:52:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Feb 26, 2014 at 02:57:16PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 24, 2014 05:24:00 PM Borislav Petkov wrote:
> > This started happening this morning after booting -rc4+tip, let's
> > add *everybody* to CC :-)
>
> What about -rc4 without tip?

I don't think so because

commit b9e1ab6d4c0582cad97699285a6b3cf992251b00
Author: Stephane Eranian <[email protected]>
Date: Tue Feb 11 16:20:12 2014 +0100

perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support

in -tip introduces that snb_uncore_imc_init_box() thing which causes the
ioremap conflict.

Btw, see my last email on this thread for more details about what I'm
seeing here.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-27 10:12:37

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Feb 26, 2014 at 10:59 AM, Borislav Petkov <[email protected]> wrote:
> Can you please, pretty please, not top-post...
>
> On Wed, Feb 26, 2014 at 10:47:05AM +0100, Stephane Eranian wrote:
>> Hi,
>>
>> Ok, so I am getting the same error message as you.
>> I checked my syslog now.
>>
>> I have my uncore_imc addr=0xfed10000 (after masking)
>>
>> And I also have pnp 00:01 overlapping the imc range completely.
>>
>> What pnp device does it really represent? the DRAM controller?
>>
>> So I think my laptop behaves like yours.
>
> grep -Er . /sys/devices/pnp0/00\:01/* 2>/dev/null
> /sys/devices/pnp0/00:01/firmware_node/hid:PNP0C02
> ...
>
> so this PNP0C02 is
>
> [ 0.363943] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
>
My Lenovo IVB is like yours. But I tried on my SandyBridge desktop and
there to BAR is at a completely different address. Same thing on my Haswell
desktop system.

As a asides, my SNB and HSW desktops with 3.14-rc4 are totally unstable.
They hang if I type make in my kernel tree. Whereas 3.14-rc3 is stable. I am
not so sure this is all related to the uncore IMC support, though.

> @Rafael, can you please make sense of this whole ACPI gunk?
>
> We have a resource conflict with pnp 00:01, analysis here:
> http://lkml.kernel.org/r/[email protected]
>
> This is the rest of the 00:01 info from sysfs:
>
> /sys/devices/pnp0/00:01/firmware_node/uid:0
> /sys/devices/pnp0/00:01/firmware_node/path:\_SB_.PCI0.LPC_.SIO_
> /sys/devices/pnp0/00:01/firmware_node/power/control:auto
> /sys/devices/pnp0/00:01/firmware_node/power/runtime_active_time:0
> /sys/devices/pnp0/00:01/firmware_node/power/runtime_status:unsupported
> /sys/devices/pnp0/00:01/firmware_node/power/runtime_suspended_time:0
> /sys/devices/pnp0/00:01/firmware_node/modalias:acpi:PNP0C02:
> /sys/devices/pnp0/00:01/firmware_node/uevent:MODALIAS=acpi:PNP0C02:
> /sys/devices/pnp0/00:01/id:PNP0c02
> /sys/devices/pnp0/00:01/power/control:auto
> /sys/devices/pnp0/00:01/power/runtime_active_time:0
> /sys/devices/pnp0/00:01/power/runtime_status:unsupported
> /sys/devices/pnp0/00:01/power/runtime_suspended_time:0
> /sys/devices/pnp0/00:01/resources:state = active
> /sys/devices/pnp0/00:01/resources:io 0x10-0x1f
> /sys/devices/pnp0/00:01/resources:io 0x90-0x9f
> /sys/devices/pnp0/00:01/resources:io 0x24-0x25
> /sys/devices/pnp0/00:01/resources:io 0x28-0x29
> /sys/devices/pnp0/00:01/resources:io 0x2c-0x2d
> /sys/devices/pnp0/00:01/resources:io 0x30-0x31
> /sys/devices/pnp0/00:01/resources:io 0x34-0x35
> /sys/devices/pnp0/00:01/resources:io 0x38-0x39
> /sys/devices/pnp0/00:01/resources:io 0x3c-0x3d
> /sys/devices/pnp0/00:01/resources:io 0xa4-0xa5
> /sys/devices/pnp0/00:01/resources:io 0xa8-0xa9
> /sys/devices/pnp0/00:01/resources:io 0xac-0xad
> /sys/devices/pnp0/00:01/resources:io 0xb0-0xb5
> /sys/devices/pnp0/00:01/resources:io 0xb8-0xb9
> /sys/devices/pnp0/00:01/resources:io 0xbc-0xbd
> /sys/devices/pnp0/00:01/resources:io 0x50-0x53
> /sys/devices/pnp0/00:01/resources:io 0x72-0x77
> /sys/devices/pnp0/00:01/resources:io 0x400-0x47f
> /sys/devices/pnp0/00:01/resources:io 0x500-0x57f
> /sys/devices/pnp0/00:01/resources:io 0x800-0x80f
> /sys/devices/pnp0/00:01/resources:io 0x15e0-0x15ef
> /sys/devices/pnp0/00:01/resources:io 0x1600-0x167f
> /sys/devices/pnp0/00:01/resources:mem 0xf8000000-0xfbffffff
> /sys/devices/pnp0/00:01/resources:mem 0xfffff000-0xffffffff
> /sys/devices/pnp0/00:01/resources:mem 0xfed1c000-0xfed1ffff
> /sys/devices/pnp0/00:01/resources:mem 0xfed10000-0xfed13fff
> /sys/devices/pnp0/00:01/resources:mem 0xfed18000-0xfed18fff
> /sys/devices/pnp0/00:01/resources:mem 0xfed19000-0xfed19fff
> /sys/devices/pnp0/00:01/resources:mem 0xfed45000-0xfed4bfff
> /sys/devices/pnp0/00:01/resources:mem 0xfed40000-0xfed44fff
> /sys/devices/pnp0/00:01/subsystem/drivers_autoprobe:1
> /sys/devices/pnp0/00:01/uevent:DRIVER=system
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2014-02-27 10:27:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
> My Lenovo IVB is like yours. But I tried on my SandyBridge desktop and
> there to BAR is at a completely different address. Same thing on my
> Haswell desktop system.

Hrrm, I'd like to see what Rafael finds out, whether what we're reading
from PCI config space is even sane.

> As a asides, my SNB and HSW desktops with 3.14-rc4 are totally
> unstable. They hang if I type make in my kernel tree. Whereas 3.14-rc3
> is stable. I am not so sure this is all related to the uncore IMC
> support, though.

Easy to test - just disable the uncore thing.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-02-27 10:30:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
> As a asides, my SNB and HSW desktops with 3.14-rc4 are totally unstable.
> They hang if I type make in my kernel tree. Whereas 3.14-rc3 is stable. I am
> not so sure this is all related to the uncore IMC support, though.

Unstable with 3.14-rc4-tip you mean? Yeah, there's a rather crucial
patch missing. I'll try and get Thomas to merge it if Ingo doesn't show
up soon.

2014-02-27 10:33:03

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 11:30 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
>> As a asides, my SNB and HSW desktops with 3.14-rc4 are totally unstable.
>> They hang if I type make in my kernel tree. Whereas 3.14-rc3 is stable. I am
>> not so sure this is all related to the uncore IMC support, though.
>
> Unstable with 3.14-rc4-tip you mean? Yeah, there's a rather crucial
> patch missing. I'll try and get Thomas to merge it if Ingo doesn't show
> up soon.

Yes, I mean from tip.git.

2014-02-27 11:09:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 11:32:58AM +0100, Stephane Eranian wrote:
> On Thu, Feb 27, 2014 at 11:30 AM, Peter Zijlstra <[email protected]> wrote:
> > On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
> >> As a asides, my SNB and HSW desktops with 3.14-rc4 are totally unstable.
> >> They hang if I type make in my kernel tree. Whereas 3.14-rc3 is stable. I am
> >> not so sure this is all related to the uncore IMC support, though.
> >
> > Unstable with 3.14-rc4-tip you mean? Yeah, there's a rather crucial
> > patch missing. I'll try and get Thomas to merge it if Ingo doesn't show
> > up soon.
>
> Yes, I mean from tip.git.

lkml.kernel.org/r/[email protected]

Should cure things; unless there's more borkage.

2014-02-27 12:21:03

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 12:08 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Feb 27, 2014 at 11:32:58AM +0100, Stephane Eranian wrote:
>> On Thu, Feb 27, 2014 at 11:30 AM, Peter Zijlstra <[email protected]> wrote:
>> > On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
>> >> As a asides, my SNB and HSW desktops with 3.14-rc4 are totally unstable.
>> >> They hang if I type make in my kernel tree. Whereas 3.14-rc3 is stable. I am
>> >> not so sure this is all related to the uncore IMC support, though.
>> >
>> > Unstable with 3.14-rc4-tip you mean? Yeah, there's a rather crucial
>> > patch missing. I'll try and get Thomas to merge it if Ingo doesn't show
>> > up soon.
>>
>> Yes, I mean from tip.git.
>
> lkml.kernel.org/r/[email protected]
>
> Should cure things; unless there's more borkage.

Works again now with your patch.
Thanks.

2014-02-27 21:57:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thursday, February 27, 2014 11:27:22 AM Borislav Petkov wrote:
> On Thu, Feb 27, 2014 at 11:12:32AM +0100, Stephane Eranian wrote:
> > My Lenovo IVB is like yours. But I tried on my SandyBridge desktop and
> > there to BAR is at a completely different address. Same thing on my
> > Haswell desktop system.
>
> Hrrm, I'd like to see what Rafael finds out, whether what we're reading
> from PCI config space is even sane.

I won't be able to look at that before Monday I'm afraid (personal stuff).

Rafael

2014-02-27 22:21:15

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Feb 27, 2014 at 11:12:17PM +0100, Rafael J. Wysocki wrote:
> I won't be able to look at that before Monday I'm afraid (personal
> stuff).

No worries, sir, whenever. It can wait.

Thanks a lot!

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-16 19:04:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
> Right. Even if we had this long-term solution, we'd still have
> Stephane's current problem, because the PNP0C02 _CRS is still wrong.
>
> We do have a drivers/pnp/quirks.c where we could conceivably adjust
> the PNP resource if we found the matching PCI device and MCHBAR. That
> should solve Stephane's problem even with the current
> drivers/pnp/system.c.

Guys, this still triggers in -rc1. Do we have a fix or something
testable at least?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-16 20:25:09

by Zhang, Rui

[permalink] [raw]
Subject: RE: Info: mapping multiple BARs. Your kernel is fine.



> -----Original Message-----
> From: Borislav Petkov [mailto:[email protected]]
> Sent: Wednesday, April 16, 2014 12:04 PM
> To: Bjorn Helgaas; Rafael J. Wysocki
> Cc: Zhang, Rui; Lu, Aaron; lkml; [email protected]; Linux PCI; ACPI Devel
> Maling List; Yinghai Lu; H. Peter Anvin; Stephane Eranian; Yan, Zheng Z
> Subject: Re: Info: mapping multiple BARs. Your kernel is fine.
> Importance: High
>
> On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
> > Right. Even if we had this long-term solution, we'd still have
> > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
> >
> > We do have a drivers/pnp/quirks.c where we could conceivably adjust
> > the PNP resource if we found the matching PCI device and MCHBAR.
> That
> > should solve Stephane's problem even with the current
> > drivers/pnp/system.c.
>
> Guys, this still triggers in -rc1. Do we have a fix or something
> testable at least?
>
Could you please attach the dmesg output after a fresh boot in -rc1?

Thanks,
rui
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?Ý¢j"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-04-16 20:31:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 09:04:04PM +0200, Borislav Petkov wrote:
> On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
> > Right. Even if we had this long-term solution, we'd still have
> > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
> >
> > We do have a drivers/pnp/quirks.c where we could conceivably adjust
> > the PNP resource if we found the matching PCI device and MCHBAR. That
> > should solve Stephane's problem even with the current
> > drivers/pnp/system.c.
>
> Guys, this still triggers in -rc1. Do we have a fix or something
> testable at least?

Hi Boris,

Can you try the patch below?



PNP: Work around Haswell BIOS defect in MCH area reporting

From: Bjorn Helgaas <[email protected]>

Work around a Haswell BIOS defect that causes part of the MCH area to be
unreported.

MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
PNP0C02 resource. The MCH space was 16KB prior to Haswell, but it is 32KB
in Haswell. Some Haswell BIOSes still report a PNP0C02 resource that is
only 16KB, which means the rest of the MCH space is consumed but
unreported.

This can cause resource map sanity check warnings or (theoretically) a
device conflict if we assigned the unreported space to another device.

The Intel perf event uncore driver tripped over this when it claimed the
MCH region:

resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
Info: mapping multiple BARs. Your kernel is fine.

To prevent this, if we find a PNP0C02 resource that covers part of the MCH
space, extend it to cover the entire space.

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pnp/quirks.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 258fef272ea7..023edf592371 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -334,6 +334,57 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
}
#endif

+static void quirk_intel_haswell_mch(struct pnp_dev *dev)
+{
+ struct pci_dev *host;
+ u32 addr_lo, addr_hi;
+ struct pci_bus_region region;
+ struct resource mch;
+ struct pnp_resource *pnp_res;
+ struct resource *res;
+
+ host = pci_get_device(PCI_VENDOR_ID_INTEL, 0x0c00, NULL);
+ if (!host)
+ return;
+
+ /*
+ * MCHBAR is not an architected PCI BAR, so MCH space is usually
+ * reported as a PNP0C02 resource. The MCH space was 16KB prior to
+ * Haswell, but it is 32KB in Haswell. Some Haswell BIOSes still
+ * report a PNP0C02 resource that is only 16KB, which means the
+ * rest of the MCH space is consumed but unreported.
+ */
+
+ /*
+ * Read MCHBAR for Host Member Mapped Register Range Base
+ * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
+ * Sec 3.1.12.
+ */
+ pci_read_config_dword(host, 0x48, &addr_lo);
+ region.start = addr_lo & ~0x7fff;
+ pci_read_config_dword(host, 0x4c, &addr_hi);
+ region.start |= (dma_addr_t) addr_hi << 32;
+ region.end = region.start + 32*1024 - 1 ;
+ pcibios_bus_to_resource(host->bus, &mch, &region);
+
+ list_for_each_entry(pnp_res, &dev->resources, list) {
+ res = &pnp_res->res;
+ if (res->end < mch.start || res->start > mch.end)
+ continue; /* no overlap */
+ if (res->start == mch.start && res->end == mch.end)
+ continue; /* exact match */
+
+ dev_info(&dev->dev, FW_BUG
+ "%pR covers only part of Intel Haswell MCH; extending to %pR\n",
+ res, &mch);
+ res->start = mch.start;
+ res->end = mch.end;
+ break;
+ }
+
+ pci_dev_put(host);
+}
+
/*
* PnP Quirks
* Cards or devices that need some tweaking due to incomplete resource info
@@ -364,6 +415,7 @@ static struct pnp_fixup pnp_fixups[] = {
#ifdef CONFIG_AMD_NB
{"PNP0c01", quirk_amd_mmconfig_area},
#endif
+ {"PNP0c02", quirk_intel_haswell_mch},
{""}
};

2014-04-16 22:31:59

by Dave Jones

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 02:31:38PM -0600, Bjorn Helgaas wrote:
> On Wed, Apr 16, 2014 at 09:04:04PM +0200, Borislav Petkov wrote:
> > On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
> > > Right. Even if we had this long-term solution, we'd still have
> > > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
> > >
> > > We do have a drivers/pnp/quirks.c where we could conceivably adjust
> > > the PNP resource if we found the matching PCI device and MCHBAR. That
> > > should solve Stephane's problem even with the current
> > > drivers/pnp/system.c.
> >
> > Guys, this still triggers in -rc1. Do we have a fix or something
> > testable at least?
>
> Hi Boris,
>
> Can you try the patch below?

I'm seeing the exact same message on my thinkpad t430s.
When I try your patch, modesetting no longer works. When it tries
to change to the framebuffer I get a black screen and lockup.
If I boot with nomodeset it locks up when it gets to X.
It all scrolls by too fast to read, but it looks like there's still
a backtrace present.

Dave

2014-04-16 22:56:08

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 06:31:22PM -0400, Dave Jones wrote:
> On Wed, Apr 16, 2014 at 02:31:38PM -0600, Bjorn Helgaas wrote:
> > On Wed, Apr 16, 2014 at 09:04:04PM +0200, Borislav Petkov wrote:
> > > On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
> > > > Right. Even if we had this long-term solution, we'd still have
> > > > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
> > > >
> > > > We do have a drivers/pnp/quirks.c where we could conceivably adjust
> > > > the PNP resource if we found the matching PCI device and MCHBAR. That
> > > > should solve Stephane's problem even with the current
> > > > drivers/pnp/system.c.
> > >
> > > Guys, this still triggers in -rc1. Do we have a fix or something
> > > testable at least?
> >
> > Hi Boris,
> >
> > Can you try the patch below?
>
> I'm seeing the exact same message on my thinkpad t430s.
> When I try your patch, modesetting no longer works. When it tries
> to change to the framebuffer I get a black screen and lockup.
> If I boot with nomodeset it locks up when it gets to X.
> It all scrolls by too fast to read, but it looks like there's still
> a backtrace present.

Ouch, sorry about that. I do see a bug in my patch (fixed below), but I
don't see how that could cause what you're seeing. Maybe I could figure
out something from this info (this can be from a kernel without my patch):

- dmesg log
- output of "find /sys/devices/pnp0 -name id -o -name resources | xargs grep ."
- output of "sudo lspci -s00:00.0 -xxx"



PNP: Work around Haswell BIOS defect in MCH area reporting

From: Bjorn Helgaas <[email protected]>

Work around a Haswell BIOS defect that causes part of the MCH area to be
unreported.

MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
PNP0C02 resource. The MCH space was 16KB prior to Haswell, but it is 32KB
in Haswell. Some Haswell BIOSes still report a PNP0C02 resource that is
only 16KB, which means the rest of the MCH space is consumed but
unreported.

This can cause resource map sanity check warnings or (theoretically) a
device conflict if we assigned the unreported space to another device.

The Intel perf event uncore driver tripped over this when it claimed the
MCH region:

resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
Info: mapping multiple BARs. Your kernel is fine.

To prevent this, if we find a PNP0C02 resource that covers part of the MCH
space, extend it to cover the entire space.

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pnp/quirks.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 258fef272ea7..8402088d4145 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -334,6 +334,60 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
}
#endif

+static void quirk_intel_haswell_mch(struct pnp_dev *dev)
+{
+ struct pci_dev *host;
+ u32 addr_lo, addr_hi;
+ struct pci_bus_region region;
+ struct resource mch;
+ struct pnp_resource *pnp_res;
+ struct resource *res;
+
+ host = pci_get_device(PCI_VENDOR_ID_INTEL, 0x0c00, NULL);
+ if (!host)
+ return;
+
+ /*
+ * MCHBAR is not an architected PCI BAR, so MCH space is usually
+ * reported as a PNP0C02 resource. The MCH space was 16KB prior to
+ * Haswell, but it is 32KB in Haswell. Some Haswell BIOSes still
+ * report a PNP0C02 resource that is only 16KB, which means the
+ * rest of the MCH space is consumed but unreported.
+ */
+
+ /*
+ * Read MCHBAR for Host Member Mapped Register Range Base
+ * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
+ * Sec 3.1.12.
+ */
+ pci_read_config_dword(host, 0x48, &addr_lo);
+ region.start = addr_lo & ~0x7fff;
+ pci_read_config_dword(host, 0x4c, &addr_hi);
+ region.start |= (dma_addr_t) addr_hi << 32;
+ region.end = region.start + 32*1024 - 1 ;
+
+ memset(&mch, 0, sizeof(mch));
+ mch.flags = IORESOURCE_MEM;
+ pcibios_bus_to_resource(host->bus, &mch, &region);
+
+ list_for_each_entry(pnp_res, &dev->resources, list) {
+ res = &pnp_res->res;
+ if (res->end < mch.start || res->start > mch.end)
+ continue; /* no overlap */
+ if (res->start == mch.start && res->end == mch.end)
+ continue; /* exact match */
+
+ dev_info(&dev->dev, FW_BUG
+ "%pR covers only part of Intel Haswell MCH; extending to %pR\n",
+ res, &mch);
+ res->start = mch.start;
+ res->end = mch.end;
+ break;
+ }
+
+ pci_dev_put(host);
+}
+
/*
* PnP Quirks
* Cards or devices that need some tweaking due to incomplete resource info
@@ -364,6 +418,7 @@ static struct pnp_fixup pnp_fixups[] = {
#ifdef CONFIG_AMD_NB
{"PNP0c01", quirk_amd_mmconfig_area},
#endif
+ {"PNP0c02", quirk_intel_haswell_mch},
{""}
};

2014-04-16 23:08:33

by Stephane Eranian

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 1:31 PM, Bjorn Helgaas <[email protected]> wrote:
> On Wed, Apr 16, 2014 at 09:04:04PM +0200, Borislav Petkov wrote:
>> On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
>> > Right. Even if we had this long-term solution, we'd still have
>> > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
>> >
>> > We do have a drivers/pnp/quirks.c where we could conceivably adjust
>> > the PNP resource if we found the matching PCI device and MCHBAR. That
>> > should solve Stephane's problem even with the current
>> > drivers/pnp/system.c.
>>
>> Guys, this still triggers in -rc1. Do we have a fix or something
>> testable at least?
>
> Hi Boris,
>
> Can you try the patch below?
>
>
>
> PNP: Work around Haswell BIOS defect in MCH area reporting
>
> From: Bjorn Helgaas <[email protected]>
>
> Work around a Haswell BIOS defect that causes part of the MCH area to be
> unreported.
>
> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
> PNP0C02 resource. The MCH space was 16KB prior to Haswell, but it is 32KB
> in Haswell. Some Haswell BIOSes still report a PNP0C02 resource that is
> only 16KB, which means the rest of the MCH space is consumed but
> unreported.
>
Why are you saying this is Haswell vs. others. I see the problem on my
IvyBridge laptop, like Boris.

> This can cause resource map sanity check warnings or (theoretically) a
> device conflict if we assigned the unreported space to another device.
>
> The Intel perf event uncore driver tripped over this when it claimed the
> MCH region:
>
> resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> Info: mapping multiple BARs. Your kernel is fine.
>
> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
> space, extend it to cover the entire space.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Reported-by: Borislav Petkov <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pnp/quirks.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index 258fef272ea7..023edf592371 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -334,6 +334,57 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
> }
> #endif
>
> +static void quirk_intel_haswell_mch(struct pnp_dev *dev)
> +{
> + struct pci_dev *host;
> + u32 addr_lo, addr_hi;
> + struct pci_bus_region region;
> + struct resource mch;
> + struct pnp_resource *pnp_res;
> + struct resource *res;
> +
> + host = pci_get_device(PCI_VENDOR_ID_INTEL, 0x0c00, NULL);
> + if (!host)
> + return;
> +
> + /*
> + * MCHBAR is not an architected PCI BAR, so MCH space is usually
> + * reported as a PNP0C02 resource. The MCH space was 16KB prior to
> + * Haswell, but it is 32KB in Haswell. Some Haswell BIOSes still
> + * report a PNP0C02 resource that is only 16KB, which means the
> + * rest of the MCH space is consumed but unreported.
> + */
> +
> + /*
> + * Read MCHBAR for Host Member Mapped Register Range Base
> + * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
> + * Sec 3.1.12.
> + */
> + pci_read_config_dword(host, 0x48, &addr_lo);
> + region.start = addr_lo & ~0x7fff;
> + pci_read_config_dword(host, 0x4c, &addr_hi);
> + region.start |= (dma_addr_t) addr_hi << 32;
> + region.end = region.start + 32*1024 - 1 ;
> + pcibios_bus_to_resource(host->bus, &mch, &region);
> +
> + list_for_each_entry(pnp_res, &dev->resources, list) {
> + res = &pnp_res->res;
> + if (res->end < mch.start || res->start > mch.end)
> + continue; /* no overlap */
> + if (res->start == mch.start && res->end == mch.end)
> + continue; /* exact match */
> +
> + dev_info(&dev->dev, FW_BUG
> + "%pR covers only part of Intel Haswell MCH; extending to %pR\n",
> + res, &mch);
> + res->start = mch.start;
> + res->end = mch.end;
> + break;
> + }
> +
> + pci_dev_put(host);
> +}
> +
> /*
> * PnP Quirks
> * Cards or devices that need some tweaking due to incomplete resource info
> @@ -364,6 +415,7 @@ static struct pnp_fixup pnp_fixups[] = {
> #ifdef CONFIG_AMD_NB
> {"PNP0c01", quirk_amd_mmconfig_area},
> #endif
> + {"PNP0c02", quirk_intel_haswell_mch},
> {""}
> };
>

2014-04-16 23:11:51

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 5:08 PM, Stephane Eranian <[email protected]> wrote:
> On Wed, Apr 16, 2014 at 1:31 PM, Bjorn Helgaas <[email protected]> wrote:
>> On Wed, Apr 16, 2014 at 09:04:04PM +0200, Borislav Petkov wrote:
>>> On Thu, Mar 20, 2014 at 02:48:30PM -0600, Bjorn Helgaas wrote:
>>> > Right. Even if we had this long-term solution, we'd still have
>>> > Stephane's current problem, because the PNP0C02 _CRS is still wrong.
>>> >
>>> > We do have a drivers/pnp/quirks.c where we could conceivably adjust
>>> > the PNP resource if we found the matching PCI device and MCHBAR. That
>>> > should solve Stephane's problem even with the current
>>> > drivers/pnp/system.c.
>>>
>>> Guys, this still triggers in -rc1. Do we have a fix or something
>>> testable at least?
>>
>> Hi Boris,
>>
>> Can you try the patch below?
>>
>>
>>
>> PNP: Work around Haswell BIOS defect in MCH area reporting
>>
>> From: Bjorn Helgaas <[email protected]>
>>
>> Work around a Haswell BIOS defect that causes part of the MCH area to be
>> unreported.
>>
>> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
>> PNP0C02 resource. The MCH space was 16KB prior to Haswell, but it is 32KB
>> in Haswell. Some Haswell BIOSes still report a PNP0C02 resource that is
>> only 16KB, which means the rest of the MCH space is consumed but
>> unreported.
>>
> Why are you saying this is Haswell vs. others. I see the problem on my
> IvyBridge laptop, like Boris.

Ah, good question. Somewhere I got pointed to the Haswell docs, which
say 32KB. I don't know what other parts have 32KB MCH spaces. If we
could figure out a list of device IDs with 32KB spaces, we could add
that to the quirk.

But I don't know how to come up with a complete list.

Bjorn

2014-04-17 00:19:28

by Dave Jones

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Wed, Apr 16, 2014 at 04:56:00PM -0600, Bjorn Helgaas wrote:

> > I'm seeing the exact same message on my thinkpad t430s.
> > When I try your patch, modesetting no longer works. When it tries
> > to change to the framebuffer I get a black screen and lockup.
> > If I boot with nomodeset it locks up when it gets to X.
> > It all scrolls by too fast to read, but it looks like there's still
> > a backtrace present.
>
> Ouch, sorry about that. I do see a bug in my patch (fixed below), but I
> don't see how that could cause what you're seeing.

updated diff made no difference fwiw.

> Maybe I could figure
> out something from this info (this can be from a kernel without my patch):
>
> - dmesg log
> - output of "find /sys/devices/pnp0 -name id -o -name resources | xargs grep ."
> - output of "sudo lspci -s00:00.0 -xxx"

attached from a fedora build of rc1.

Dave


Attachments:
(No filename) (903.00 B)
pnp.txt (3.93 kB)
pci (920.00 B)
dmesg (36.51 kB)
Download all attachments

2014-04-17 10:45:40

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Hi Bjorn,

thanks for the patch, a couple of notes below:

On Wed, Apr 16, 2014 at 04:56:00PM -0600, Bjorn Helgaas wrote:
> PNP: Work around Haswell BIOS defect in MCH area reporting
>
> From: Bjorn Helgaas <[email protected]>
>
> Work around a Haswell BIOS defect that causes part of the MCH area to be
> unreported.

Yep, what Stephane said, this is not HSW only.

> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
> PNP0C02 resource. The MCH space was 16KB prior to Haswell, but it is 32KB
> in Haswell. Some Haswell BIOSes still report a PNP0C02 resource that is
> only 16KB, which means the rest of the MCH space is consumed but
> unreported.
>
> This can cause resource map sanity check warnings or (theoretically) a
> device conflict if we assigned the unreported space to another device.
>
> The Intel perf event uncore driver tripped over this when it claimed the
> MCH region:
>
> resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> Info: mapping multiple BARs. Your kernel is fine.
>
> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
> space, extend it to cover the entire space.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Reported-by: Borislav Petkov <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pnp/quirks.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index 258fef272ea7..8402088d4145 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -334,6 +334,60 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
> }
> #endif
>
> +static void quirk_intel_haswell_mch(struct pnp_dev *dev)
> +{
> + struct pci_dev *host;
> + u32 addr_lo, addr_hi;
> + struct pci_bus_region region;
> + struct resource mch;
> + struct pnp_resource *pnp_res;
> + struct resource *res;
> +
> + host = pci_get_device(PCI_VENDOR_ID_INTEL, 0x0c00, NULL);

And because it is not HSW only, this PCI device ID doesn't match on my
IVB system. On mine the hostbridge is

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
Subsystem: Lenovo Device 21fa
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: <access denied>
Kernel driver in use: ivb_uncore
00: 86 80 54 01 06 00 90 20 09 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 fa 21
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00

and from looking at Dave's, it is the same one, so PCI device ID is
0x154.

With that changed to

host = pci_get_device(PCI_VENDOR_ID_INTEL, 0x0154, NULL);

and a bit of debugging code, it says now:

[ 0.235739] quirk_intel_haswell_mch: entry
[ 0.235800] quirk_intel_haswell_mch: got host: 0x0
[ 0.235860] quirk_intel_haswell_mch: mch: [mem 0xfed10000-0xfed17fff]
[ 0.235930] quirk_intel_haswell_mch: res: [mem 0xfed10000-0xfed13fff]
[ 0.235990] pnp 00:01: [Firmware Bug]: [mem 0xfed10000-0xfed13fff] covers only part of Intel Haswell MCH; extending to [mem 0xfed10000-0xfed17fff]

So you probably want to have a list of hostbridge pci ids in the quirk
or so.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 18:34:40

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

Thanks a lot for testing this out and debugging my issues.

Here's a new version that looks for both device IDs I know about.

I'm still nervous about the modeset problem Dave is seeing. Since the
original patch wouldn't find an 8086:0c00 device on Dave's system, it
should have done nothing. But since it caused a modesetting problem,
there's something else doing on that I don't understand.

Bjorn



PNP: Work around BIOS defects in Intel MCH area reporting

From: Bjorn Helgaas <[email protected]>

Work around BIOSes that don't report the entire Intel MCH area.

MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
PNP0C02 resource. The MCH space was once 16KB, but is 32KB in newer parts.
Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
the rest of the MCH space is consumed but unreported.

This can cause resource map sanity check warnings or (theoretically) a
device conflict if we assigned the unreported space to another device.

The Intel perf event uncore driver tripped over this when it claimed the
MCH region:

resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
Info: mapping multiple BARs. Your kernel is fine.

To prevent this, if we find a PNP0C02 resource that covers part of the MCH
space, extend it to cover the entire space.

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pnp/quirks.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index 258fef272ea7..403bd5c42ed1 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -334,6 +334,79 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
}
#endif

+/* Device IDs of parts that have 32KB MCH space */
+static const unsigned int mch_quirk_devices[] = {
+ 0x0154, /* Ivy Bridge */
+ 0x0c00, /* Haswell */
+};
+
+static struct pci_dev *get_intel_host(void)
+{
+ int i;
+ struct pci_dev *host;
+
+ for (i = 0; i < ARRAY_SIZE(mch_quirk_devices); i++) {
+ host = pci_get_device(PCI_VENDOR_ID_INTEL, mch_quirk_devices[i],
+ NULL);
+ if (host)
+ return host;
+ }
+ return NULL;
+}
+
+static void quirk_intel_mch(struct pnp_dev *dev)
+{
+ struct pci_dev *host;
+ u32 addr_lo, addr_hi;
+ struct pci_bus_region region;
+ struct resource mch;
+ struct pnp_resource *pnp_res;
+ struct resource *res;
+
+ host = get_intel_host();
+ if (!host)
+ return;
+
+ /*
+ * MCHBAR is not an architected PCI BAR, so MCH space is usually
+ * reported as a PNP0C02 resource. The MCH space was originally
+ * 16KB, but is 32KB in newer parts. Some BIOSes still report a
+ * PNP0C02 resource that is only 16KB, which means the rest of the
+ * MCH space is consumed but unreported.
+ */
+
+ /*
+ * Read MCHBAR for Host Member Mapped Register Range Base
+ * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
+ * Sec 3.1.12.
+ */
+ pci_read_config_dword(host, 0x48, &addr_lo);
+ region.start = addr_lo & ~0x7fff;
+ pci_read_config_dword(host, 0x4c, &addr_hi);
+ region.start |= (dma_addr_t) addr_hi << 32;
+ region.end = region.start + 32*1024 - 1 ;
+
+ memset(&mch, 0, sizeof(mch));
+ mch.flags = IORESOURCE_MEM;
+ pcibios_bus_to_resource(host->bus, &mch, &region);
+
+ list_for_each_entry(pnp_res, &dev->resources, list) {
+ res = &pnp_res->res;
+ if (res->end < mch.start || res->start > mch.end)
+ continue; /* no overlap */
+ if (res->start == mch.start && res->end == mch.end)
+ continue; /* exact match */
+
+ dev_info(&dev->dev, FW_BUG "PNP resource %pR covers only part of %s Intel MCH; extending to %pR\n",
+ res, pci_name(host), &mch);
+ res->start = mch.start;
+ res->end = mch.end;
+ break;
+ }
+
+ pci_dev_put(host);
+}
+
/*
* PnP Quirks
* Cards or devices that need some tweaking due to incomplete resource info
@@ -364,6 +437,7 @@ static struct pnp_fixup pnp_fixups[] = {
#ifdef CONFIG_AMD_NB
{"PNP0c01", quirk_amd_mmconfig_area},
#endif
+ {"PNP0c02", quirk_intel_mch},
{""}
};

2014-04-17 19:49:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
> Thanks a lot for testing this out and debugging my issues.
>
> Here's a new version that looks for both device IDs I know about.
>
> I'm still nervous about the modeset problem Dave is seeing. Since the
> original patch wouldn't find an 8086:0c00 device on Dave's system, it
> should have done nothing. But since it caused a modesetting problem,
> there's something else doing on that I don't understand.

Yeah, this is strange, to put it mildly. This quirk wouldnt've done
anything besides the iteration over the pci devices with pci_get_device.
Which wouldn't do anything (refcount increment or so) if it didn't find
the device, right?

Bah, today is the day of the strange bugs. :-\

> PNP: Work around BIOS defects in Intel MCH area reporting
>
> From: Bjorn Helgaas <[email protected]>
>
> Work around BIOSes that don't report the entire Intel MCH area.
>
> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
> PNP0C02 resource. The MCH space was once 16KB, but is 32KB in newer parts.
> Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
> the rest of the MCH space is consumed but unreported.
>
> This can cause resource map sanity check warnings or (theoretically) a
> device conflict if we assigned the unreported space to another device.
>
> The Intel perf event uncore driver tripped over this when it claimed the
> MCH region:
>
> resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
> Info: mapping multiple BARs. Your kernel is fine.
>
> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
> space, extend it to cover the entire space.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Reported-by: Borislav Petkov <[email protected]>

Yep, this one works fine:

[ 0.403855] pnp 00:01: [Firmware Bug]: PNP resource [mem 0xfed10000-0xfed13fff] covers only part of 0000:00:00.0 Intel MCH; extending to [mem 0xfed10000-0xfed17fff]

Acked-by: Borislav Petkov <[email protected]>
Tested-by: Borislav Petkov <[email protected]>

Just a minor nitpick below.

> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pnp/quirks.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index 258fef272ea7..403bd5c42ed1 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -334,6 +334,79 @@ static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
> }
> #endif
>
> +/* Device IDs of parts that have 32KB MCH space */
> +static const unsigned int mch_quirk_devices[] = {
> + 0x0154, /* Ivy Bridge */
> + 0x0c00, /* Haswell */
> +};
> +
> +static struct pci_dev *get_intel_host(void)
> +{
> + int i;
> + struct pci_dev *host;
> +
> + for (i = 0; i < ARRAY_SIZE(mch_quirk_devices); i++) {
> + host = pci_get_device(PCI_VENDOR_ID_INTEL, mch_quirk_devices[i],
> + NULL);
> + if (host)
> + return host;
> + }
> + return NULL;
> +}
> +
> +static void quirk_intel_mch(struct pnp_dev *dev)
> +{
> + struct pci_dev *host;
> + u32 addr_lo, addr_hi;
> + struct pci_bus_region region;
> + struct resource mch;
> + struct pnp_resource *pnp_res;
> + struct resource *res;
> +
> + host = get_intel_host();
> + if (!host)
> + return;
> +
> + /*
> + * MCHBAR is not an architected PCI BAR, so MCH space is usually
> + * reported as a PNP0C02 resource. The MCH space was originally
> + * 16KB, but is 32KB in newer parts. Some BIOSes still report a
> + * PNP0C02 resource that is only 16KB, which means the rest of the
> + * MCH space is consumed but unreported.
> + */
> +
> + /*
> + * Read MCHBAR for Host Member Mapped Register Range Base
> + * https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-desktop-vol-2-datasheet
> + * Sec 3.1.12.
> + */
> + pci_read_config_dword(host, 0x48, &addr_lo);
> + region.start = addr_lo & ~0x7fff;
> + pci_read_config_dword(host, 0x4c, &addr_hi);
> + region.start |= (dma_addr_t) addr_hi << 32;
> + region.end = region.start + 32*1024 - 1 ;

checkpatch complains about a trailing space before the semicolon.

> +
> + memset(&mch, 0, sizeof(mch));
> + mch.flags = IORESOURCE_MEM;
> + pcibios_bus_to_resource(host->bus, &mch, &region);
> +
> + list_for_each_entry(pnp_res, &dev->resources, list) {
> + res = &pnp_res->res;
> + if (res->end < mch.start || res->start > mch.end)
> + continue; /* no overlap */
> + if (res->start == mch.start && res->end == mch.end)
> + continue; /* exact match */
> +
> + dev_info(&dev->dev, FW_BUG "PNP resource %pR covers only part of %s Intel MCH; extending to %pR\n",
> + res, pci_name(host), &mch);
> + res->start = mch.start;
> + res->end = mch.end;
> + break;
> + }
> +
> + pci_dev_put(host);
> +}
> +
> /*
> * PnP Quirks
> * Cards or devices that need some tweaking due to incomplete resource info
> @@ -364,6 +437,7 @@ static struct pnp_fixup pnp_fixups[] = {
> #ifdef CONFIG_AMD_NB
> {"PNP0c01", quirk_amd_mmconfig_area},
> #endif
> + {"PNP0c02", quirk_intel_mch},
> {""}
> };

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 19:53:18

by Dave Jones

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
> Thanks a lot for testing this out and debugging my issues.
>
> Here's a new version that looks for both device IDs I know about.

I can confirm this patch does fix the backtrace.
I disabled lockdep, and now I can get to X each boot, but I still see
a black screen rather than a console between modesetting becoming active, and X starting.

(The lockdep thing turned out to be a known XFS false positive, but for
some reason it actually caused X to lock up)

> I'm still nervous about the modeset problem Dave is seeing. Since the
> original patch wouldn't find an 8086:0c00 device on Dave's system, it
> should have done nothing. But since it caused a modesetting problem,
> there's something else doing on that I don't understand.

I don't know if it's relevant, but this laptop (and I suspect many other
thinkpads which seem affected) have dual gfx, both show up on the bus,
even if though the nvidia isn't in use..

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Lenovo Device 2200
Flags: bus master, fast devsel, latency 0, IRQ 44
Memory at f1000000 (64-bit, non-prefetchable) [size=4M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
I/O ports at 6000 [size=64]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915

01:00.0 3D controller: NVIDIA Corporation GF117M [GeForce 610M/710M/820M / GT 620M/625M/630M/720M] (rev a1)
Subsystem: Lenovo NVS 5200M
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 5000 [size=128]
Expansion ROM at <ignored> [disabled]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>

Just as X starts up, I see this in dmesg..

[ 42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

Dave

2014-04-17 20:01:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
> Just as X starts up, I see this in dmesg..
>
> [ 42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

FWIW, I have that too. It should be something i915-related:

[ 0.617673] [drm] Memory usable by graphics device = 2048M
[ 0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
[ 0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 0.694631] [drm] Driver supports precise vblank timestamp query.
[ 0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[ 0.799829] fbcon: inteldrmfb (fb0) is primary device
[ 1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-17 20:04:32

by Dave Jones

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 10:01:27PM +0200, Borislav Petkov wrote:
> On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
> > Just as X starts up, I see this in dmesg..
> >
> > [ 42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
>
> FWIW, I have that too. It should be something i915-related:
>
> [ 0.617673] [drm] Memory usable by graphics device = 2048M
> [ 0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
> [ 0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 0.694631] [drm] Driver supports precise vblank timestamp query.
> [ 0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
> [ 0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
> [ 0.799829] fbcon: inteldrmfb (fb0) is primary device
> [ 1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun

Can you send me your .config off-list ?
I wonder if this is something config specific that's causing me to see
this, and you not, given we've apparently got similar machines.

Dave

2014-04-17 20:11:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 1:48 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 17, 2014 at 12:26:37PM -0600, Bjorn Helgaas wrote:
>> Thanks a lot for testing this out and debugging my issues.
>>
>> Here's a new version that looks for both device IDs I know about.
>>
>> I'm still nervous about the modeset problem Dave is seeing. Since the
>> original patch wouldn't find an 8086:0c00 device on Dave's system, it
>> should have done nothing. But since it caused a modesetting problem,
>> there's something else doing on that I don't understand.
>
> Yeah, this is strange, to put it mildly. This quirk wouldnt've done
> anything besides the iteration over the pci devices with pci_get_device.
> Which wouldn't do anything (refcount increment or so) if it didn't find
> the device, right?

Right.

> Bah, today is the day of the strange bugs. :-\
>
>> PNP: Work around BIOS defects in Intel MCH area reporting
>>
>> From: Bjorn Helgaas <[email protected]>
>>
>> Work around BIOSes that don't report the entire Intel MCH area.
>>
>> MCHBAR is not an architected PCI BAR, so MCH space is usually reported as a
>> PNP0C02 resource. The MCH space was once 16KB, but is 32KB in newer parts.
>> Some BIOSes still report a PNP0C02 resource that is only 16KB, which means
>> the rest of the MCH space is consumed but unreported.
>>
>> This can cause resource map sanity check warnings or (theoretically) a
>> device conflict if we assigned the unreported space to another device.
>>
>> The Intel perf event uncore driver tripped over this when it claimed the
>> MCH region:
>>
>> resource map sanity check conflict: 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff pnp 00:01
>> Info: mapping multiple BARs. Your kernel is fine.
>>
>> To prevent this, if we find a PNP0C02 resource that covers part of the MCH
>> space, extend it to cover the entire space.
>>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Reported-by: Borislav Petkov <[email protected]>
>
> Yep, this one works fine:
>
> [ 0.403855] pnp 00:01: [Firmware Bug]: PNP resource [mem 0xfed10000-0xfed13fff] covers only part of 0000:00:00.0 Intel MCH; extending to [mem 0xfed10000-0xfed17fff]
>
> Acked-by: Borislav Petkov <[email protected]>
> Tested-by: Borislav Petkov <[email protected]>

>> + region.end = region.start + 32*1024 - 1 ;

> checkpatch complains about a trailing space before the semicolon.

Thanks! I hate typos like that.

I'll fix this, add your tested-by and ack, and send to Rafael.

Bjorn

2014-04-17 20:54:36

by Dave Jones

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 04:03:52PM -0400, Dave Jones wrote:
> On Thu, Apr 17, 2014 at 10:01:27PM +0200, Borislav Petkov wrote:
> > On Thu, Apr 17, 2014 at 03:52:40PM -0400, Dave Jones wrote:
> > > Just as X starts up, I see this in dmesg..
> > >
> > > [ 42.879049] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
> >
> > FWIW, I have that too. It should be something i915-related:
> >
> > [ 0.617673] [drm] Memory usable by graphics device = 2048M
> > [ 0.694445] i915 0000:00:02.0: irq 42 for MSI/MSI-X
> > [ 0.694549] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > [ 0.694631] [drm] Driver supports precise vblank timestamp query.
> > [ 0.695313] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
> > [ 0.788300] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
> > [ 0.799829] fbcon: inteldrmfb (fb0) is primary device
> > [ 1.176845] [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
>
> Can you send me your .config off-list ?
> I wonder if this is something config specific that's causing me to see
> this, and you not, given we've apparently got similar machines.

ok, with your config I get back to a console after the modesetting
switch, but then it hangs in USB init.

Hrmm.

Dave

2014-04-17 21:01:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 04:53:55PM -0400, Dave Jones wrote:
> ok, with your config I get back to a console after the modesetting
> switch, but then it hangs in USB init.

Maybe because of our machines are not that similar there? Can you take
my config but paste the usb part of yours and see whether it boots fine
then? It could be yours and mine have different USB hw...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2014-04-18 10:39:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: Info: mapping multiple BARs. Your kernel is fine.

On Thu, Apr 17, 2014 at 05:30:27PM -0400, Dave Jones wrote:
> I think it's just implicated because that's the next thing that seems
> to init after the modeswitch. The config differences are small, just
> things like =m instead of =y or vice-versa.
>
> I'm about to head into a long weekend, so I'll get back to this on
> Monday, but for now I'm out of ideas.

This is for when you get back: :-)

Can you debug that hang a bit more, like enable some sensible options
under "Kernel Hacking" or somesuch, boot with initcall_debug, add
more printks at key places? If the machine would tell us why exactly
it hangs, we might have an idea, like corruption, transaction stall,
whatever...

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--