2012-10-29 04:03:59

by Mark Lord

[permalink] [raw]
Subject: Regression from 3.4.9 to 3.4.16 "stable" kernel

My server here runs the 3.4.xx series of "stable" kernels.
Until today, it was running 3.4.9.
Today I tried to upgrade it to 3.4.16.
It hangs in setup.c.

I've isolated the fault down to this specific change
that was made between 3.4.9 and 3.4.16.
Reverting this change allows the system to boot/run normally again.


--- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
+++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
@@ -927,8 +927,21 @@

#ifdef CONFIG_X86_64
if (max_pfn > max_low_pfn) {
- max_pfn_mapped = init_memory_mapping(1UL<<32,
- max_pfn<<PAGE_SHIFT);
+ int i;
+ for (i = 0; i < e820.nr_map; i++) {
+ struct e820entry *ei = &e820.map[i];
+
+ if (ei->addr + ei->size <= 1UL << 32)
+ continue;
+
+ if (ei->type == E820_RESERVED)
+ continue;
+
+ max_pfn_mapped = init_memory_mapping(
+ ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
+ ei->addr + ei->size);
+ }
+
/* can we preseve max_low_pfn ?*/
max_low_pfn = max_pfn;
}


2012-10-29 06:46:58

by Willy Tarreau

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> My server here runs the 3.4.xx series of "stable" kernels.
> Until today, it was running 3.4.9.
> Today I tried to upgrade it to 3.4.16.
> It hangs in setup.c.
>
> I've isolated the fault down to this specific change
> that was made between 3.4.9 and 3.4.16.
> Reverting this change allows the system to boot/run normally again.
>
>
> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
> @@ -927,8 +927,21 @@
>
> #ifdef CONFIG_X86_64
> if (max_pfn > max_low_pfn) {
> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> - max_pfn<<PAGE_SHIFT);
> + int i;
> + for (i = 0; i < e820.nr_map; i++) {
> + struct e820entry *ei = &e820.map[i];
> +
> + if (ei->addr + ei->size <= 1UL << 32)
> + continue;
> +
> + if (ei->type == E820_RESERVED)
> + continue;
> +
> + max_pfn_mapped = init_memory_mapping(
> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> + ei->addr + ei->size);
> + }
> +
> /* can we preseve max_low_pfn ?*/
> max_low_pfn = max_pfn;
> }

For the record, it is this commit introduced in 3.4.16 :

commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
Author: Jacob Shin <[email protected]>
Date: Thu Oct 20 16:15:26 2011 -0500

x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.

commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.

On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.

[ hpa: this should be done not just for > 4 GB but for everything above the legacy
region (1 MB), at the very least. That, however, turns out to require significant
restructuring. That work is well underway, but is not suitable for rc/stable. ]

Signed-off-by: Jacob Shin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Willy

2012-10-29 14:22:57

by Mark Lord

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On 12-10-29 02:46 AM, Willy Tarreau wrote:
> On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
>> My server here runs the 3.4.xx series of "stable" kernels.
>> Until today, it was running 3.4.9.
>> Today I tried to upgrade it to 3.4.16.
>> It hangs in setup.c.
>>
>> I've isolated the fault down to this specific change
>> that was made between 3.4.9 and 3.4.16.
>> Reverting this change allows the system to boot/run normally again.
>>
>>
>> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
>> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
>> @@ -927,8 +927,21 @@
>>
>> #ifdef CONFIG_X86_64
>> if (max_pfn > max_low_pfn) {
>> - max_pfn_mapped = init_memory_mapping(1UL<<32,
>> - max_pfn<<PAGE_SHIFT);
>> + int i;
>> + for (i = 0; i < e820.nr_map; i++) {
>> + struct e820entry *ei = &e820.map[i];
>> +
>> + if (ei->addr + ei->size <= 1UL << 32)
>> + continue;
>> +
>> + if (ei->type == E820_RESERVED)
>> + continue;
>> +
>> + max_pfn_mapped = init_memory_mapping(
>> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
>> + ei->addr + ei->size);
>> + }
>> +
>> /* can we preseve max_low_pfn ?*/
>> max_low_pfn = max_pfn;
>> }
>
> For the record, it is this commit introduced in 3.4.16 :
>
> commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> Author: Jacob Shin <[email protected]>
> Date: Thu Oct 20 16:15:26 2011 -0500
>
> x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
>
> commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
>
> On systems with very large memory (1 TB in our case), BIOS may report a
> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> these from the direct mapping.
>
> [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> region (1 MB), at the very least. That, however, turns out to require significant
> restructuring. That work is well underway, but is not suitable for rc/stable. ]
>
> Signed-off-by: Jacob Shin <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: H. Peter Anvin <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> Willy


Thanks, Willy.

I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
So there's a fix somewhere in between that perhaps could also get backported to -stable.

-ml

2012-10-29 14:37:55

by Mark Lord

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On 12-10-29 10:22 AM, Mark Lord wrote:
> On 12-10-29 02:46 AM, Willy Tarreau wrote:
>> On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
>>> My server here runs the 3.4.xx series of "stable" kernels.
>>> Until today, it was running 3.4.9.
>>> Today I tried to upgrade it to 3.4.16.
>>> It hangs in setup.c.
>>>
>>> I've isolated the fault down to this specific change
>>> that was made between 3.4.9 and 3.4.16.
>>> Reverting this change allows the system to boot/run normally again.
..
>> For the record, it is this commit introduced in 3.4.16 :
>>
>> commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
>> Author: Jacob Shin <[email protected]>
>> Date: Thu Oct 20 16:15:26 2011 -0500
>>
>> x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
>>
>> commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
>>
>> On systems with very large memory (1 TB in our case), BIOS may report a
>> reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
>> these from the direct mapping.
>>
>> [ hpa: this should be done not just for > 4 GB but for everything above the legacy
>> region (1 MB), at the very least. That, however, turns out to require significant
>> restructuring. That work is well underway, but is not suitable for rc/stable. ]
>>
>> Signed-off-by: Jacob Shin <[email protected]>
>> Link: http://lkml.kernel.org/r/[email protected]
>> Signed-off-by: H. Peter Anvin <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
..
> I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> So there's a fix somewhere in between that perhaps could also get backported to -stable.
..

Heh.. except that kernel has its own issues -- hangs in some kind of screen loop
in the Radeon code (?) when trying to shutdown. ctrl-alt-sysrq s+u+s+b gets out of that,
but it hangs in a similar fashion during the subsequent reboot.

A full power-off was required to get the Radeon video to behave so I could reboot
the system with 3.4.16 again. I'm not going to pursue that issue for now, though.

2012-10-29 14:41:21

by Ben Hutchings

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> >> My server here runs the 3.4.xx series of "stable" kernels.
> >> Until today, it was running 3.4.9.
> >> Today I tried to upgrade it to 3.4.16.
> >> It hangs in setup.c.
> >>
> >> I've isolated the fault down to this specific change
> >> that was made between 3.4.9 and 3.4.16.
> >> Reverting this change allows the system to boot/run normally again.
> >>
> >>
> >> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
> >> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
> >> @@ -927,8 +927,21 @@
> >>
> >> #ifdef CONFIG_X86_64
> >> if (max_pfn > max_low_pfn) {
> >> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> >> - max_pfn<<PAGE_SHIFT);
> >> + int i;
> >> + for (i = 0; i < e820.nr_map; i++) {
> >> + struct e820entry *ei = &e820.map[i];
> >> +
> >> + if (ei->addr + ei->size <= 1UL << 32)
> >> + continue;
> >> +
> >> + if (ei->type == E820_RESERVED)
> >> + continue;
> >> +
> >> + max_pfn_mapped = init_memory_mapping(
> >> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> >> + ei->addr + ei->size);
> >> + }
> >> +
> >> /* can we preseve max_low_pfn ?*/
> >> max_low_pfn = max_pfn;
> >> }
> >
> > For the record, it is this commit introduced in 3.4.16 :
> >
> > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > Author: Jacob Shin <[email protected]>
> > Date: Thu Oct 20 16:15:26 2011 -0500
> >
> > x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> >
> > commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> >
> > On systems with very large memory (1 TB in our case), BIOS may report a
> > reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > these from the direct mapping.
> >
> > [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > region (1 MB), at the very least. That, however, turns out to require significant
> > restructuring. That work is well underway, but is not suitable for rc/stable. ]
> >
> > Signed-off-by: Jacob Shin <[email protected]>
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: H. Peter Anvin <[email protected]>
> > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >
> > Willy
>
>
> Thanks, Willy.
>
> I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> So there's a fix somewhere in between that perhaps could also get backported to -stable.

Might well be:

commit 1f2ff682ac951ed82cc043cf140d2851084512df
Author: Yinghai Lu <[email protected]>
Date: Mon Oct 22 16:35:18 2012 -0700

x86, mm: Use memblock memory loop instead of e820_RAM

However I'm not sure that this loop is correct either. Yinghai, does
your version definitely iterate in increasing pfn order? If not then
the max_pfn_mapped assignment must be conditional.

Ben.

--
Ben Hutchings
Humans are not rational beings; they are rationalising beings.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2012-10-29 14:47:37

by Jacob Shin

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > >> My server here runs the 3.4.xx series of "stable" kernels.
> > >> Until today, it was running 3.4.9.
> > >> Today I tried to upgrade it to 3.4.16.
> > >> It hangs in setup.c.
> > >>
> > >> I've isolated the fault down to this specific change
> > >> that was made between 3.4.9 and 3.4.16.
> > >> Reverting this change allows the system to boot/run normally again.
> > >>
> > >>
> > >> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
> > >> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
> > >> @@ -927,8 +927,21 @@
> > >>
> > >> #ifdef CONFIG_X86_64
> > >> if (max_pfn > max_low_pfn) {
> > >> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> > >> - max_pfn<<PAGE_SHIFT);
> > >> + int i;
> > >> + for (i = 0; i < e820.nr_map; i++) {
> > >> + struct e820entry *ei = &e820.map[i];
> > >> +
> > >> + if (ei->addr + ei->size <= 1UL << 32)
> > >> + continue;
> > >> +
> > >> + if (ei->type == E820_RESERVED)
> > >> + continue;
> > >> +
> > >> + max_pfn_mapped = init_memory_mapping(
> > >> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > >> + ei->addr + ei->size);
> > >> + }
> > >> +
> > >> /* can we preseve max_low_pfn ?*/
> > >> max_low_pfn = max_pfn;
> > >> }
> > >
> > > For the record, it is this commit introduced in 3.4.16 :
> > >
> > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > Author: Jacob Shin <[email protected]>
> > > Date: Thu Oct 20 16:15:26 2011 -0500
> > >
> > > x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > >
> > > commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > >
> > > On systems with very large memory (1 TB in our case), BIOS may report a
> > > reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > > these from the direct mapping.
> > >
> > > [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > > region (1 MB), at the very least. That, however, turns out to require significant
> > > restructuring. That work is well underway, but is not suitable for rc/stable. ]
> > >
> > > Signed-off-by: Jacob Shin <[email protected]>
> > > Link: http://lkml.kernel.org/r/[email protected]
> > > Signed-off-by: H. Peter Anvin <[email protected]>
> > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > >
> > > Willy
> >
> >
> > Thanks, Willy.
> >
> > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > So there's a fix somewhere in between that perhaps could also get backported to -stable.
>
> Might well be:
>
> commit 1f2ff682ac951ed82cc043cf140d2851084512df
> Author: Yinghai Lu <[email protected]>
> Date: Mon Oct 22 16:35:18 2012 -0700
>
> x86, mm: Use memblock memory loop instead of e820_RAM
>
> However I'm not sure that this loop is correct either. Yinghai, does
> your version definitely iterate in increasing pfn order? If not then
> the max_pfn_mapped assignment must be conditional.

Hi, I believe these two commits in mainline should fix Alexander's failing
machien:

844ab6f993b1d32eb40512503d35ff6ad0c57030
f82f64dd9f485e13f29f369772d4a0e868e5633a

This thread has some more details:

https://lkml.org/lkml/2012/10/21/157

Sorry, and thanks!

>
> Ben.
>
> --
> Ben Hutchings
> Humans are not rational beings; they are rationalising beings.


2012-10-29 16:37:36

by Yinghai Lu

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 7:40 AM, Ben Hutchings <[email protected]> wrote:
> However I'm not sure that this loop is correct either. Yinghai, does
> your version definitely iterate in increasing pfn order? If not then
> the max_pfn_mapped assignment must be conditional.

yes, memblock is in order.

Yinghai

2012-10-29 16:58:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 09:47:22AM -0500, Jacob Shin wrote:
> On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> > On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > > >> My server here runs the 3.4.xx series of "stable" kernels.
> > > >> Until today, it was running 3.4.9.
> > > >> Today I tried to upgrade it to 3.4.16.
> > > >> It hangs in setup.c.
> > > >>
> > > >> I've isolated the fault down to this specific change
> > > >> that was made between 3.4.9 and 3.4.16.
> > > >> Reverting this change allows the system to boot/run normally again.
> > > >>
> > > >>
> > > >> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
> > > >> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
> > > >> @@ -927,8 +927,21 @@
> > > >>
> > > >> #ifdef CONFIG_X86_64
> > > >> if (max_pfn > max_low_pfn) {
> > > >> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> > > >> - max_pfn<<PAGE_SHIFT);
> > > >> + int i;
> > > >> + for (i = 0; i < e820.nr_map; i++) {
> > > >> + struct e820entry *ei = &e820.map[i];
> > > >> +
> > > >> + if (ei->addr + ei->size <= 1UL << 32)
> > > >> + continue;
> > > >> +
> > > >> + if (ei->type == E820_RESERVED)
> > > >> + continue;
> > > >> +
> > > >> + max_pfn_mapped = init_memory_mapping(
> > > >> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > > >> + ei->addr + ei->size);
> > > >> + }
> > > >> +
> > > >> /* can we preseve max_low_pfn ?*/
> > > >> max_low_pfn = max_pfn;
> > > >> }
> > > >
> > > > For the record, it is this commit introduced in 3.4.16 :
> > > >
> > > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > > Author: Jacob Shin <[email protected]>
> > > > Date: Thu Oct 20 16:15:26 2011 -0500
> > > >
> > > > x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > > >
> > > > commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > > >
> > > > On systems with very large memory (1 TB in our case), BIOS may report a
> > > > reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > > > these from the direct mapping.
> > > >
> > > > [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > > > region (1 MB), at the very least. That, however, turns out to require significant
> > > > restructuring. That work is well underway, but is not suitable for rc/stable. ]
> > > >
> > > > Signed-off-by: Jacob Shin <[email protected]>
> > > > Link: http://lkml.kernel.org/r/[email protected]
> > > > Signed-off-by: H. Peter Anvin <[email protected]>
> > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > >
> > > > Willy
> > >
> > >
> > > Thanks, Willy.
> > >
> > > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > > So there's a fix somewhere in between that perhaps could also get backported to -stable.
> >
> > Might well be:
> >
> > commit 1f2ff682ac951ed82cc043cf140d2851084512df
> > Author: Yinghai Lu <[email protected]>
> > Date: Mon Oct 22 16:35:18 2012 -0700
> >
> > x86, mm: Use memblock memory loop instead of e820_RAM
> >
> > However I'm not sure that this loop is correct either. Yinghai, does
> > your version definitely iterate in increasing pfn order? If not then
> > the max_pfn_mapped assignment must be conditional.
>
> Hi, I believe these two commits in mainline should fix Alexander's failing
> machien:
>
> 844ab6f993b1d32eb40512503d35ff6ad0c57030
> f82f64dd9f485e13f29f369772d4a0e868e5633a
>
> This thread has some more details:
>
> https://lkml.org/lkml/2012/10/21/157
>
> Sorry, and thanks!

Thanks, I've queued these up now.

greg k-h

2012-10-29 17:04:49

by Jacob Shin

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 09:58:23AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Oct 29, 2012 at 09:47:22AM -0500, Jacob Shin wrote:
> > On Mon, Oct 29, 2012 at 02:40:58PM +0000, Ben Hutchings wrote:
> > > On Mon, 2012-10-29 at 10:22 -0400, Mark Lord wrote:
> > > > On 12-10-29 02:46 AM, Willy Tarreau wrote:
> > > > > On Mon, Oct 29, 2012 at 12:03:55AM -0400, Mark Lord wrote:
> > > > >> My server here runs the 3.4.xx series of "stable" kernels.
> > > > >> Until today, it was running 3.4.9.
> > > > >> Today I tried to upgrade it to 3.4.16.
> > > > >> It hangs in setup.c.
> > > > >>
> > > > >> I've isolated the fault down to this specific change
> > > > >> that was made between 3.4.9 and 3.4.16.
> > > > >> Reverting this change allows the system to boot/run normally again.
> > > > >>
> > > > >>
> > > > >> --- linux-3.4.9/arch/x86/kernel/setup.c 2012-08-15 11:17:17.000000000 -0400
> > > > >> +++ linux-3.4.16/arch/x86/kernel/setup.c 2012-10-28 13:36:33.000000000 -0400
> > > > >> @@ -927,8 +927,21 @@
> > > > >>
> > > > >> #ifdef CONFIG_X86_64
> > > > >> if (max_pfn > max_low_pfn) {
> > > > >> - max_pfn_mapped = init_memory_mapping(1UL<<32,
> > > > >> - max_pfn<<PAGE_SHIFT);
> > > > >> + int i;
> > > > >> + for (i = 0; i < e820.nr_map; i++) {
> > > > >> + struct e820entry *ei = &e820.map[i];
> > > > >> +
> > > > >> + if (ei->addr + ei->size <= 1UL << 32)
> > > > >> + continue;
> > > > >> +
> > > > >> + if (ei->type == E820_RESERVED)
> > > > >> + continue;
> > > > >> +
> > > > >> + max_pfn_mapped = init_memory_mapping(
> > > > >> + ei->addr < 1UL << 32 ? 1UL << 32 : ei->addr,
> > > > >> + ei->addr + ei->size);
> > > > >> + }
> > > > >> +
> > > > >> /* can we preseve max_low_pfn ?*/
> > > > >> max_low_pfn = max_pfn;
> > > > >> }
> > > > >
> > > > > For the record, it is this commit introduced in 3.4.16 :
> > > > >
> > > > > commit efd5fa0c1a1d1b46846ea6e8d1a783d0d8a6a721
> > > > > Author: Jacob Shin <[email protected]>
> > > > > Date: Thu Oct 20 16:15:26 2011 -0500
> > > > >
> > > > > x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
> > > > >
> > > > > commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream.
> > > > >
> > > > > On systems with very large memory (1 TB in our case), BIOS may report a
> > > > > reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
> > > > > these from the direct mapping.
> > > > >
> > > > > [ hpa: this should be done not just for > 4 GB but for everything above the legacy
> > > > > region (1 MB), at the very least. That, however, turns out to require significant
> > > > > restructuring. That work is well underway, but is not suitable for rc/stable. ]
> > > > >
> > > > > Signed-off-by: Jacob Shin <[email protected]>
> > > > > Link: http://lkml.kernel.org/r/[email protected]
> > > > > Signed-off-by: H. Peter Anvin <[email protected]>
> > > > > Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > > >
> > > > > Willy
> > > >
> > > >
> > > > Thanks, Willy.
> > > >
> > > > I've also now downloaded linux-3.7.0-rc3, and it boots/runs without need for patching.
> > > > So there's a fix somewhere in between that perhaps could also get backported to -stable.
> > >
> > > Might well be:
> > >
> > > commit 1f2ff682ac951ed82cc043cf140d2851084512df
> > > Author: Yinghai Lu <[email protected]>
> > > Date: Mon Oct 22 16:35:18 2012 -0700
> > >
> > > x86, mm: Use memblock memory loop instead of e820_RAM
> > >
> > > However I'm not sure that this loop is correct either. Yinghai, does
> > > your version definitely iterate in increasing pfn order? If not then
> > > the max_pfn_mapped assignment must be conditional.
> >
> > Hi, I believe these two commits in mainline should fix Alexander's failing
> > machien:
> >
> > 844ab6f993b1d32eb40512503d35ff6ad0c57030
> > f82f64dd9f485e13f29f369772d4a0e868e5633a
> >
> > This thread has some more details:
> >
> > https://lkml.org/lkml/2012/10/21/157
> >
> > Sorry, and thanks!
>
> Thanks, I've queued these up now.

Thanks,

And also unrelated to Alexander's panic, but related to the commit in question
1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a

These two commits from Yinghai should also be backported into stable, or I
think it is already in progress (I saw an email out to Yinghai saying that the
patch did not apply cleanly, and needs to be manually backported):

6ede1fd3cb404c0016de6ac529df46d561bd558b
1f2ff682ac951ed82cc043cf140d2851084512df

Right Yinghai?

Thanks!

-Jacob

>
> greg k-h
>

2012-10-29 23:00:58

by Mark Lord

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

There's something else very wrong when going from 3.4.9 to 3.4.16.
I've done it on two machines here, one the AMD-450 server (64-bit),
and the other my main notebook (Core2duo 32-bit-PAE).

Both systems feel much more sluggish than usual with 3.4.16 running.
Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
and the usual responsive feel has returned.

Vague, I know, but something bad happened in there somewhere.

Cheers

2012-10-29 23:03:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
> There's something else very wrong when going from 3.4.9 to 3.4.16.
> I've done it on two machines here, one the AMD-450 server (64-bit),
> and the other my main notebook (Core2duo 32-bit-PAE).
>
> Both systems feel much more sluggish than usual with 3.4.16 running.
> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
> and the usual responsive feel has returned.
>
> Vague, I know, but something bad happened in there somewhere.

That's too vague for me to do anything with, sorry. Bisection would be
good if you can figure out how to measure this.

greg k-h

2012-10-30 01:20:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Mon, Oct 29, 2012 at 4:03 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
>> Both systems feel much more sluggish than usual with 3.4.16 running.
>> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
>> and the usual responsive feel has returned.
>>
>> Vague, I know, but something bad happened in there somewhere.
>
> That's too vague for me to do anything with, sorry. Bisection would be
> good if you can figure out how to measure this.

yes, at least you can post boot log of working kernel and not working kernel.
then we could figure out if there is any corner case is not handled or
uncovered.

2012-10-30 04:53:10

by Mark Lord

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On 12-10-29 07:03 PM, Greg Kroah-Hartman wrote:
> On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
>> There's something else very wrong when going from 3.4.9 to 3.4.16.
>> I've done it on two machines here, one the AMD-450 server (64-bit),
>> and the other my main notebook (Core2duo 32-bit-PAE).
>>
>> Both systems feel much more sluggish than usual with 3.4.16 running.
>> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
>> and the usual responsive feel has returned.
>>
>> Vague, I know, but something bad happened in there somewhere.
>
> That's too vague for me to do anything with, sorry. Bisection would be
> good if you can figure out how to measure this.

Well, I'd bet Donkeys to Daises that reverting the kernel/sched.c changes
will probably fix the responsiveness, but I haven't done that yet.
I've lost enough time already debugging the other issues.

This is more just an indication that perhaps -stable patches need better review
than they're getting. Take the setup.c breakage: as soon as I pointed it out,
a few people jumped in with knowledge that it was broken, and that patches
existed to fix it.

That kind of thing should be happening before a -stable release,
though I don't know how you would get the Right People to look
at this stuff then rather than after the fact. Maybe a topic
for a future kernel summit or something.

Best wishes.
-ml

2012-10-30 16:30:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression from 3.4.9 to 3.4.16 "stable" kernel

On Tue, Oct 30, 2012 at 12:53:06AM -0400, Mark Lord wrote:
> On 12-10-29 07:03 PM, Greg Kroah-Hartman wrote:
> > On Mon, Oct 29, 2012 at 07:00:54PM -0400, Mark Lord wrote:
> >> There's something else very wrong when going from 3.4.9 to 3.4.16.
> >> I've done it on two machines here, one the AMD-450 server (64-bit),
> >> and the other my main notebook (Core2duo 32-bit-PAE).
> >>
> >> Both systems feel much more sluggish than usual with 3.4.16 running.
> >> Reverted them both back to earlier kernels (3.4.9, 3.4.4-PAE),
> >> and the usual responsive feel has returned.
> >>
> >> Vague, I know, but something bad happened in there somewhere.
> >
> > That's too vague for me to do anything with, sorry. Bisection would be
> > good if you can figure out how to measure this.
>
> Well, I'd bet Donkeys to Daises that reverting the kernel/sched.c changes
> will probably fix the responsiveness, but I haven't done that yet.
> I've lost enough time already debugging the other issues.
>
> This is more just an indication that perhaps -stable patches need better review
> than they're getting. Take the setup.c breakage: as soon as I pointed it out,
> a few people jumped in with knowledge that it was broken, and that patches
> existed to fix it.

There will always be bugs, fixing them quickly is the best that we can
do.

> That kind of thing should be happening before a -stable release,
> though I don't know how you would get the Right People to look
> at this stuff then rather than after the fact. Maybe a topic
> for a future kernel summit or something.

I send patches to everyone involved, and there's a -rc period where
people are _supposed_ to test things out. If you know of a better way
to get other people to test and review, please let me know, this is the
best that we have come up with so far.

thanks,

greg k-h