On Mon, Jul 07, 2014 at 10:32:56PM +0200, Fabio Coatti wrote:
> I'm seeing this message in latest kernels (this is from 3.15.4, but I have same
> message starting from 3.15.0, IIRC):
So 3.14.0 didn't show this?
If so, can you run 'git bisect' between those two kernel versions to try
to track down the issue?
thanks,
greg k-h
In data luned? 7 luglio 2014 13:47:47, Greg Kroah-Hartman ha scritto:
> On Mon, Jul 07, 2014 at 10:32:56PM +0200, Fabio Coatti wrote:
> > I'm seeing this message in latest kernels (this is from 3.15.4, but I have
> > same
> > message starting from 3.15.0, IIRC):
> So 3.14.0 didn't show this?
>
> If so, can you run 'git bisect' between those two kernel versions to try
> to track down the issue?
>
> thanks,
>
> greg k-h
Yep, I'll do that, it will take some time as the last kernel not showing this
message was 3.14.6
I'll keep you posted, thanks.
--
Fabio
In data luned? 7 luglio 2014 13:47:47, Greg Kroah-Hartman ha scritto:
> On Mon, Jul 07, 2014 at 10:32:56PM +0200, Fabio Coatti wrote:
> > I'm seeing this message in latest kernels (this is from 3.15.4, but I have
> > same
> > message starting from 3.15.0, IIRC):
> So 3.14.0 didn't show this?
>
> If so, can you run 'git bisect' between those two kernel versions to try
> to track down the issue?
>
> thanks,
>
> greg k-h
ok, I tried to bisect as suggested and got the commit reported below. However
I'm not really sure to have got the right one, as one kernel refused to
compile during the last steps. However I post here the result, maybe they can
be useful.
Tomorrow I can retry the whole process again starting from different commits.
b9e1ab6d4c0582cad97699285a6b3cf992251b00 is the first bad commit
commit b9e1ab6d4c0582cad97699285a6b3cf992251b00
Author: Stephane Eranian <[email protected]>
Date: Tue Feb 11 16:20:12 2014 +0100
perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support
This patch adds a new uncore PMU for Intel SNB/IVB/HSW client
CPUs. It adds the Integrated Memory Controller (IMC) PMU. This
new PMU provides a set of events to measure memory bandwidth utilization.
The IMC on those processor is PCI-space based. This patch
exposes a new uncore PMU on those processor: uncore_imc
Two new events are defined:
- name: data_reads
- code: 0x1
- unit: 64 bytes
- number of full cacheline read requests to the IMC
- name: data_writes
- code: 0x2
- unit: 64 bytes
- number of full cacheline write requests to the IMC
Documentation available at:
http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
:040000 040000 2d628022cbc4b8969a2ec311082053510cf6eed5
79b2a1cc3ed29e4820a5ae4221b4cf603c138887 M arch
--
Fabio
On Wed, Jul 09, 2014 at 08:41:17PM +0200, Fabio Coatti wrote:
> In data luned? 7 luglio 2014 13:47:47, Greg Kroah-Hartman ha scritto:
> > On Mon, Jul 07, 2014 at 10:32:56PM +0200, Fabio Coatti wrote:
> > > I'm seeing this message in latest kernels (this is from 3.15.4, but I have
> > > same
> > > message starting from 3.15.0, IIRC):
> > So 3.14.0 didn't show this?
> >
> > If so, can you run 'git bisect' between those two kernel versions to try
> > to track down the issue?
> >
> > thanks,
> >
> > greg k-h
> ok, I tried to bisect as suggested and got the commit reported below. However
> I'm not really sure to have got the right one, as one kernel refused to
> compile during the last steps. However I post here the result, maybe they can
> be useful.
Try cc:ing everyone on that patch, with the original information you
provided, and the linux-kernel mailing list. Those developers should be
able to help you out properly.
thanks,
greg k-h
In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha scritto:
> Try cc:ing everyone on that patch, with the original information you
> provided, and the linux-kernel mailing list. Those developers should be
> able to help you out properly.
Ok, here you can find the description of a problem that I'm experiencing on
latest kernels, since 3.15.0. (this report comes from 3.15.4)
lug 07 22:08:00 calvin kernel: resource map sanity check conflict: 0xfed10000
0xfed15fff 0xfed10000 0xfed13fff reserved
lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
lug 07 22:08:00 calvin kernel: WARNING: CPU: 2 PID: 1 at
arch/x86/mm/ioremap.c:171 __ioremap_caller+0x290/0x2fa()
lug 07 22:08:00 calvin kernel: Info: mapping multiple BARs. Your kernel is
fine.
lug 07 22:08:00 calvin kernel: Modules linked in:
lug 07 22:08:00 calvin kernel:
lug 07 22:08:00 calvin kernel: CPU: 2 PID: 1 Comm: swapper/0 Not tainted
3.15.4 #1
lug 07 22:08:00 calvin kernel: Hardware name: Hewlett-Packard HP EliteBook
Folio 9470m/18DF, BIOS 68IBD Ver. F.40 02/01/2013
lug 07 22:08:00 calvin kernel: 00009f90000000f7 ffffffff8175d44b ffff8802334fbc78
ffffffff810af219
lug 07 22:08:00 calvin kernel: ffffffff81028e43 ffffc90000070000 ffff8802334fbcc8
00000000fed10000
lug 07 22:08:00 calvin kernel: 00000000fed16000 ffffffff810af275 ffffffff81991de2
0000000000000018
lug 07 22:08:00 calvin kernel: Call Trace:
lug 07 22:08:00 calvin kernel: [<ffffffff8175d44b>] ? dump_stack+0x49/0x6a
lug 07 22:08:00 calvin kernel: [<ffffffff810af219>] ?
warn_slowpath_common+0x6f/0x84
lug 07 22:08:00 calvin kernel: [<ffffffff81028e43>] ?
__ioremap_caller+0x290/0x2fa
lug 07 22:08:00 calvin kernel: [<ffffffff810af275>] ? warn_slowpath_fmt+0x47/0x49
lug 07 22:08:00 calvin kernel: [<ffffffff810b3932>] ?
iomem_map_sanity_check+0xa5/0xb1
lug 07 22:08:00 calvin kernel: [<ffffffff81028e43>] ?
__ioremap_caller+0x290/0x2fa
lug 07 22:08:00 calvin kernel: [<ffffffff81016316>] ?
snb_uncore_imc_init_box+0x5c/0x7a
lug 07 22:08:00 calvin kernel: [<ffffffff81017e8e>] ?
uncore_pci_probe+0x100/0x168
lug 07 22:08:00 calvin kernel: [<ffffffff813c8da9>] ? pci_device_probe+0x6c/0xcb
lug 07 22:08:00 calvin kernel: [<ffffffff814e8f46>] ?
driver_probe_device+0x9b/0x1ce
lug 07 22:08:00 calvin kernel: [<ffffffff814e90fd>] ? __driver_attach+0x53/0x73
lug 07 22:08:00 calvin kernel: [<ffffffff814e90aa>] ? __device_attach+0x31/0x31
lug 07 22:08:00 calvin kernel: [<ffffffff814e7814>] ? bus_for_each_dev+0x6e/0x78
lug 07 22:08:00 calvin kernel: [<ffffffff814e87f9>] ? bus_add_driver+0xfb/0x1c4
lug 07 22:08:00 calvin kernel: [<ffffffff814e95f9>] ? driver_register+0x83/0xbb
lug 07 22:08:00 calvin kernel: [<ffffffff81cb0fb6>] ?
uncore_pmu_register+0xd1/0xd1
lug 07 22:08:00 calvin kernel: [<ffffffff81cb1128>] ?
intel_uncore_init+0x172/0x41e
lug 07 22:08:00 calvin kernel: [<ffffffff81cb0fb6>] ?
uncore_pmu_register+0xd1/0xd1
lug 07 22:08:00 calvin kernel: [<ffffffff8100029f>] ? do_one_initcall+0x88/0x11c
lug 07 22:08:00 calvin kernel: [<ffffffff810c41ac>] ? parse_args+0x17f/0x23b
lug 07 22:08:00 calvin kernel: [<ffffffff81ca8e1f>] ?
kernel_init_freeable+0x14f/0x1d1
lug 07 22:08:00 calvin kernel: [<ffffffff81ca86b2>] ? do_early_param+0x81/0x81
lug 07 22:08:00 calvin kernel: [<ffffffff81756463>] ? rest_init+0x77/0x77
lug 07 22:08:00 calvin kernel: [<ffffffff81756468>] ? kernel_init+0x5/0xd0
lug 07 22:08:00 calvin kernel: [<ffffffff8176493c>] ? ret_from_fork+0x7c/0xb0
lug 07 22:08:00 calvin kernel: [<ffffffff81756463>] ? rest_init+0x77/0x77
lug 07 22:08:00 calvin kernel: ---[ end trace 076d7a33d4c45496 ]---
lug 07 22:08:00 calvin kernel: RAPL PMU detected, hw unit 2^-16 Joules, API
unit is 2^-32 Joules, 3 fixed counters 163840 ms ovfl timer
Bisecting I got here:
b9e1ab6d4c0582cad97699285a6b3cf992251b00 is the first bad commit
commit b9e1ab6d4c0582cad97699285a6b3cf992251b00
Author: Stephane Eranian <[email protected]>
Date: Tue Feb 11 16:20:12 2014 +0100
perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support
This patch adds a new uncore PMU for Intel SNB/IVB/HSW client
CPUs. It adds the Integrated Memory Controller (IMC) PMU. This
new PMU provides a set of events to measure memory bandwidth utilization.
The IMC on those processor is PCI-space based. This patch
exposes a new uncore PMU on those processor: uncore_imc
Two new events are defined:
- name: data_reads
- code: 0x1
- unit: 64 bytes
- number of full cacheline read requests to the IMC
- name: data_writes
- code: 0x2
- unit: 64 bytes
- number of full cacheline write requests to the IMC
Documentation available at:
http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
:040000 040000 2d628022cbc4b8969a2ec311082053510cf6eed5
79b2a1cc3ed29e4820a5ae4221b4cf603c138887 M arch
Please note that while bisecting I got a kernel with compilation error, so I'm
not 100% sure of the correctness of the result. I can retry the whole process
with differen starting point, however I send here the results hoping that they
can be of some help.
Attached you can find my config.gz
gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3
Linux calvin 3.15.4 #1 SMP PREEMPT Mon Jul 7 11:18:48 CEST 2014 x86_64
Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz GenuineIntel GNU/Linux
Of course I'm available for any additional information, please cc: me as I'm
not subscribed to lkml atm.
many thanks.
--
Fabio
On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha scritto:
>
> > Try cc:ing everyone on that patch, with the original information you
> > provided, and the linux-kernel mailing list. Those developers should be
> > able to help you out properly.
>
> Ok, here you can find the description of a problem that I'm experiencing on
> latest kernels, since 3.15.0. (this report comes from 3.15.4)
>
> lug 07 22:08:00 calvin kernel: resource map sanity check conflict: 0xfed10000
> 0xfed15fff 0xfed10000 0xfed13fff reserved
> lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
I think this is a 'known' issue on Thinkpad (iirc). For some obscure
reason its BIOS has funny ideas about resources etc.
I couldn't quickly find the prvious thread, maybe Stephane knows.
On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> > In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha scritto:
> >
> > > Try cc:ing everyone on that patch, with the original information you
> > > provided, and the linux-kernel mailing list. Those developers should be
> > > able to help you out properly.
> >
> > Ok, here you can find the description of a problem that I'm experiencing on
> > latest kernels, since 3.15.0. (this report comes from 3.15.4)
> >
> > lug 07 22:08:00 calvin kernel: resource map sanity check conflict: 0xfed10000
> > 0xfed15fff 0xfed10000 0xfed13fff reserved
> > lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
>
> I think this is a 'known' issue on Thinkpad (iirc). For some obscure
> reason its BIOS has funny ideas about resources etc.
>
> I couldn't quickly find the prvious thread, maybe Stephane knows.
Found it:
lkml.kernel.org/r/[email protected]
On Thu, Jul 10, 2014 at 10:52:08AM +0200, Peter Zijlstra wrote:
> On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> > > In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha scritto:
> > >
> > > > Try cc:ing everyone on that patch, with the original information you
> > > > provided, and the linux-kernel mailing list. Those developers should be
> > > > able to help you out properly.
> > >
> > > Ok, here you can find the description of a problem that I'm experiencing on
> > > latest kernels, since 3.15.0. (this report comes from 3.15.4)
> > >
> > > lug 07 22:08:00 calvin kernel: resource map sanity check conflict: 0xfed10000
> > > 0xfed15fff 0xfed10000 0xfed13fff reserved
> > > lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
> >
> > I think this is a 'known' issue on Thinkpad (iirc). For some obscure
> > reason its BIOS has funny ideas about resources etc.
> >
> > I couldn't quickly find the prvious thread, maybe Stephane knows.
>
> Found it:
>
> lkml.kernel.org/r/[email protected]
Ok, reread that thread and no definite conclusion was found I think.
Just Thinkpad firmware having overlapping resources.
In data gioved? 10 luglio 2014 10:54:48, Peter Zijlstra ha scritto:
> On Thu, Jul 10, 2014 at 10:52:08AM +0200, Peter Zijlstra wrote:
> > On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
> > > On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> > > > In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha
scritto:
> > > > > Try cc:ing everyone on that patch, with the original information you
> > > > > provided, and the linux-kernel mailing list. Those developers
> > > > > should be
> > > > > able to help you out properly.
> > > >
> > > > Ok, here you can find the description of a problem that I'm
> > > > experiencing on
> > > > latest kernels, since 3.15.0. (this report comes from 3.15.4)
> > > >
> > > > lug 07 22:08:00 calvin kernel: resource map sanity check conflict:
> > > > 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff reserved
> > > > lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
> > >
> > > I think this is a 'known' issue on Thinkpad (iirc). For some obscure
> > > reason its BIOS has funny ideas about resources etc.
> > >
> > > I couldn't quickly find the prvious thread, maybe Stephane knows.
> >
> > Found it:
> > lkml.kernel.org/r/[email protected]
>
> Ok, reread that thread and no definite conclusion was found I think.
> Just Thinkpad firmware having overlapping resources.
So I guess that this happens also for HP Folio 9470m, not only in thinkpads. I
wonder how many machines shares this behaviour...
--
Fabio
On Thu, Jul 10, 2014 at 2:13 PM, Fabio Coatti <[email protected]> wrote:
> In data giovedì 10 luglio 2014 10:54:48, Peter Zijlstra ha scritto:
>> On Thu, Jul 10, 2014 at 10:52:08AM +0200, Peter Zijlstra wrote:
>> > On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
>> > > On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
>> > > > In data mercoledì 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha
> scritto:
>> > > > > Try cc:ing everyone on that patch, with the original information you
>> > > > > provided, and the linux-kernel mailing list. Those developers
>> > > > > should be
>> > > > > able to help you out properly.
>> > > >
>> > > > Ok, here you can find the description of a problem that I'm
>> > > > experiencing on
>> > > > latest kernels, since 3.15.0. (this report comes from 3.15.4)
>> > > >
>> > > > lug 07 22:08:00 calvin kernel: resource map sanity check conflict:
>> > > > 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff reserved
>> > > > lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
>> > >
>> > > I think this is a 'known' issue on Thinkpad (iirc). For some obscure
>> > > reason its BIOS has funny ideas about resources etc.
>> > >
>> > > I couldn't quickly find the prvious thread, maybe Stephane knows.
>> >
>> > Found it:
>> > lkml.kernel.org/r/[email protected]
>>
>> Ok, reread that thread and no definite conclusion was found I think.
>> Just Thinkpad firmware having overlapping resources.
>
>
> So I guess that this happens also for HP Folio 9470m, not only in thinkpads. I
> wonder how many machines shares this behaviour...
>
Somehow, I thought that Bjorn had proposed a fix for this.
On Thu, Jul 10, 2014 at 09:12:44PM +0200, Stephane Eranian wrote:
> On Thu, Jul 10, 2014 at 2:13 PM, Fabio Coatti <[email protected]> wrote:
> > In data gioved? 10 luglio 2014 10:54:48, Peter Zijlstra ha scritto:
> >> On Thu, Jul 10, 2014 at 10:52:08AM +0200, Peter Zijlstra wrote:
> >> > On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
> >> > > On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> >> > > > In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha
> > scritto:
> >> > > > > Try cc:ing everyone on that patch, with the original information you
> >> > > > > provided, and the linux-kernel mailing list. Those developers
> >> > > > > should be
> >> > > > > able to help you out properly.
> >> > > >
> >> > > > Ok, here you can find the description of a problem that I'm
> >> > > > experiencing on
> >> > > > latest kernels, since 3.15.0. (this report comes from 3.15.4)
> >> > > >
> >> > > > lug 07 22:08:00 calvin kernel: resource map sanity check conflict:
> >> > > > 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff reserved
> >> > > > lug 07 22:08:00 calvin kernel: ------------[ cut here ]------------
> >> > >
> >> > > I think this is a 'known' issue on Thinkpad (iirc). For some obscure
> >> > > reason its BIOS has funny ideas about resources etc.
> >> > >
> >> > > I couldn't quickly find the prvious thread, maybe Stephane knows.
> >> >
> >> > Found it:
> >> > lkml.kernel.org/r/[email protected]
> >>
> >> Ok, reread that thread and no definite conclusion was found I think.
> >> Just Thinkpad firmware having overlapping resources.
> >
> >
> > So I guess that this happens also for HP Folio 9470m, not only in thinkpads. I
> > wonder how many machines shares this behaviour...
> >
> Somehow, I thought that Bjorn had proposed a fix for this.
Yep, this [1]:
cb171f7abb9a PNP: Work around BIOS defects in Intel MCH area reporting
which appeared in v3.15, so it should be in your kernel. But apparently
it didn't work. Fabio, can you pastebin your complete dmesg log?
[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cb171f7abb9a
In data gioved? 10 luglio 2014 14:05:27, Bjorn Helgaas ha scritto:
> On Thu, Jul 10, 2014 at 09:12:44PM +0200, Stephane Eranian wrote:
> > On Thu, Jul 10, 2014 at 2:13 PM, Fabio Coatti <[email protected]>
wrote:
> > > In data gioved? 10 luglio 2014 10:54:48, Peter Zijlstra ha scritto:
> > >> On Thu, Jul 10, 2014 at 10:52:08AM +0200, Peter Zijlstra wrote:
> > >> > On Thu, Jul 10, 2014 at 10:44:21AM +0200, Peter Zijlstra wrote:
> > >> > > On Wed, Jul 09, 2014 at 12:48:05PM -0700, Fabio Coatti wrote:
> > >> > > > In data mercoled? 9 luglio 2014 11:54:21, Greg Kroah-Hartman ha
> > >
> > > scritto:
> > >> > > > > Try cc:ing everyone on that patch, with the original
> > >> > > > > information you
> > >> > > > > provided, and the linux-kernel mailing list. Those developers
> > >> > > > > should be
> > >> > > > > able to help you out properly.
> > >> > > >
> > >> > > > Ok, here you can find the description of a problem that I'm
> > >> > > > experiencing on
> > >> > > > latest kernels, since 3.15.0. (this report comes from 3.15.4)
> > >> > > >
> > >> > > > lug 07 22:08:00 calvin kernel: resource map sanity check
> > >> > > > conflict:
> > >> > > > 0xfed10000 0xfed15fff 0xfed10000 0xfed13fff reserved
> > >> > > > lug 07 22:08:00 calvin kernel: ------------[ cut here
> > >> > > > ]------------
> > >> > >
> > >> > > I think this is a 'known' issue on Thinkpad (iirc). For some
> > >> > > obscure
> > >> > > reason its BIOS has funny ideas about resources etc.
> > >> > >
> > >> > > I couldn't quickly find the prvious thread, maybe Stephane knows.
> > >> >
> > >> > Found it:
> > >> > lkml.kernel.org/r/[email protected]
> > >>
> > >> Ok, reread that thread and no definite conclusion was found I think.
> > >> Just Thinkpad firmware having overlapping resources.
> > >
> > > So I guess that this happens also for HP Folio 9470m, not only in
> > > thinkpads. I wonder how many machines shares this behaviour...
> >
> > Somehow, I thought that Bjorn had proposed a fix for this.
>
> Yep, this [1]:
>
> cb171f7abb9a PNP: Work around BIOS defects in Intel MCH area reporting
>
> which appeared in v3.15, so it should be in your kernel. But apparently
> it didn't work. Fabio, can you pastebin your complete dmesg log?
>
> [1]
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c
> b171f7abb9a
Sure, here you go:
http://pastebin.com/FiL7N64b
--
Fabio
On Fri, Jul 11, 2014 at 1:38 AM, Fabio Coatti <[email protected]> wrote:
> In data giovedì 10 luglio 2014 14:05:27, Bjorn Helgaas ha scritto:
>> ...
>> Fabio, can you pastebin your complete dmesg log?
>
> Sure, here you go:
>
> http://pastebin.com/FiL7N64b
I opened this bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=80041 and attached your
dmesg to it. I see what the problem is, but I don't have a good idea
yet for how to fix it.
The problem is that we don't handle e820 and PNP device resource
information correctly. From the attached dmesg, we have this:
BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff] reserved
system 00:00: [mem 0xfed10000-0xfed17fff] could not be reserved
The 00:00 PNP device describes the correct 32K range for the Intel MCH
(see [1] for details). But the [mem 0xfed10000-0xfed13fff] entry from
e820 was added to the resource map first, and it covers only the first
16K of the MCH range. This caused the subsequent PNP reservation to
fail. Then the snb_uncore_imc_init_box() reservation caused the
warning, because it would be a child of the e820 entry but it covers
more space.
[1] fixed a similar issue where the PNP device described only the
first 16K of the MCH range. This case is slightly different because
here it's the e820 entry that is incorrect.
[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cb171f7abb9a
[+cc Yinghai, Rafael]
On Fri, Jul 11, 2014 at 12:11 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Jul 11, 2014 at 1:38 AM, Fabio Coatti <[email protected]> wrote:
>> In data giovedì 10 luglio 2014 14:05:27, Bjorn Helgaas ha scritto:
>>> ...
>>> Fabio, can you pastebin your complete dmesg log?
>>
>> Sure, here you go:
>>
>> http://pastebin.com/FiL7N64b
>
> I opened this bugzilla:
> https://bugzilla.kernel.org/show_bug.cgi?id=80041 and attached your
> dmesg to it. I see what the problem is, but I don't have a good idea
> yet for how to fix it.
>
> The problem is that we don't handle e820 and PNP device resource
> information correctly. From the attached dmesg, we have this:
>
> BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff] reserved
> system 00:00: [mem 0xfed10000-0xfed17fff] could not be reserved
>
> The 00:00 PNP device describes the correct 32K range for the Intel MCH
> (see [1] for details). But the [mem 0xfed10000-0xfed13fff] entry from
> e820 was added to the resource map first, and it covers only the first
> 16K of the MCH range. This caused the subsequent PNP reservation to
> fail. Then the snb_uncore_imc_init_box() reservation caused the
> warning, because it would be a child of the e820 entry but it covers
> more space.
>
> [1] fixed a similar issue where the PNP device described only the
> first 16K of the MCH range. This case is slightly different because
> here it's the e820 entry that is incorrect.
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cb171f7abb9a
One of the reasons for iomem_resource is so we don't hand out the same
address space to two different devices. We *could* do that by keeping
track of the union of all devices and reserved areas that we know
about.
But the current resource code is more strict: it enforces a hierarchy.
For example, in this case, it rejects the 00:00 PNP resource because
it is larger than the e820 entry. The problem with rejecting it is
that we might hand out [mem 0xfed14000-0xfed17fff] to another device
even though PNP told us that it's in use.
I'm about to head out for a few weeks of vacation, so I won't be able
to do anything with this.
Bjorn
On Tue, Jul 15, 2014 at 1:33 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Yinghai, Rafael]
>
>>> http://pastebin.com/FiL7N64b
>>
>> I opened this bugzilla:
>> https://bugzilla.kernel.org/show_bug.cgi?id=80041 and attached your
>> dmesg to it. I see what the problem is, but I don't have a good idea
>> yet for how to fix it.
>>
>> The problem is that we don't handle e820 and PNP device resource
>> information correctly. From the attached dmesg, we have this:
>>
>> BIOS-e820: [mem 0x00000000fed10000-0x00000000fed13fff] reserved
>> system 00:00: [mem 0xfed10000-0xfed17fff] could not be reserved
>>
>> The 00:00 PNP device describes the correct 32K range for the Intel MCH
>> (see [1] for details). But the [mem 0xfed10000-0xfed13fff] entry from
>> e820 was added to the resource map first, and it covers only the first
>> 16K of the MCH range. This caused the subsequent PNP reservation to
>> fail. Then the snb_uncore_imc_init_box() reservation caused the
>> warning, because it would be a child of the e820 entry but it covers
>> more space.
>>
>> [1] fixed a similar issue where the PNP device described only the
>> first 16K of the MCH range. This case is slightly different because
>> here it's the e820 entry that is incorrect.
>>
>> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cb171f7abb9a
>
> One of the reasons for iomem_resource is so we don't hand out the same
> address space to two different devices. We *could* do that by keeping
> track of the union of all devices and reserved areas that we know
> about.
>
> But the current resource code is more strict: it enforces a hierarchy.
> For example, in this case, it rejects the 00:00 PNP resource because
> it is larger than the e820 entry. The problem with rejecting it is
> that we might hand out [mem 0xfed14000-0xfed17fff] to another device
> even though PNP told us that it's in use.
>
> I'm about to head out for a few weeks of vacation, so I won't be able
> to do anything with this.
In that case, we could reserve the whole MCH range in e820 from
trim_snb_memory() instead.
HPA, what is your idea about it?
Yinghai
On 07/15/2014 04:40 PM, Yinghai Lu wrote:
>>
>> One of the reasons for iomem_resource is so we don't hand out the same
>> address space to two different devices. We *could* do that by keeping
>> track of the union of all devices and reserved areas that we know
>> about.
>>
>> But the current resource code is more strict: it enforces a hierarchy.
>> For example, in this case, it rejects the 00:00 PNP resource because
>> it is larger than the e820 entry. The problem with rejecting it is
>> that we might hand out [mem 0xfed14000-0xfed17fff] to another device
>> even though PNP told us that it's in use.
>>
>> I'm about to head out for a few weeks of vacation, so I won't be able
>> to do anything with this.
>
> In that case, we could reserve the whole MCH range in e820 from
> trim_snb_memory() instead.
>
> HPA, what is your idea about it?
>
> Yinghai
>
We could quirk it, but we would have to make bloody darn sure that we
don't break any systems because of unusual configuration and so on.
I agree that we need to treat fixed resources as equivalent to reserved.
This is also a BIOS bug (it should reserve the whole region), but that
happens far too frequently. I don't know if we have any way to do that
without massive surgery to the current code, though.
-hpa
On Tue, Jul 15, 2014 at 4:54 PM, H. Peter Anvin <[email protected]> wrote:
> On 07/15/2014 04:40 PM, Yinghai Lu wrote:
> We could quirk it, but we would have to make bloody darn sure that we
> don't break any systems because of unusual configuration and so on.
>
> I agree that we need to treat fixed resources as equivalent to reserved.
> This is also a BIOS bug (it should reserve the whole region), but that
> happens far too frequently. I don't know if we have any way to do that
> without massive surgery to the current code, though.
Should be similar to early_gart_iommu_check(), even less code.
Yinghai