2012-06-12 05:21:21

by Cong Wang

[permalink] [raw]
Subject: [PATCH] x86: revert "x86: Fix S4 regression"

From: Cong Wang <[email protected]>

This reverts the following commit:

commit 8548c84da2f47e71bbbe300f55edb768492575f7
Author: Takashi Iwai <[email protected]>
Date: Sun Oct 23 23:19:12 2011 +0200

x86: Fix S4 regression

Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
regression since 2.6.39, namely the machine reboots occasionally at S4
resume. It doesn't happen always, overall rate is about 1/20. But,
like other bugs, once when this happens, it continues to happen.

This patch fixes the problem by essentially reverting the memory
assignment in the older way.

According to the previous discussion:
http://marc.info/?l=linux-kernel&m=133161674120253&w=2
it seems that so far the best solution is just reverting it.

Takashi, could you help to test if the S4 regression is still
there after this patch?

Reported-by: CAI Qian <[email protected]>
Cc: Dave Young <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Takashi Iwai <[email protected]>
Signed-off-by: Cong Wang <[email protected]>

---
arch/x86/mm/init.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc4e9d8..7ab7975 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -74,8 +74,9 @@ static void __init find_early_table_space(struct map_range *mr, unsigned long en
#ifdef CONFIG_X86_32
/* for fixmap */
tables += roundup(__end_of_fixed_addresses * sizeof(pte_t), PAGE_SIZE);
-#endif
+
good_end = max_pfn_mapped << PAGE_SHIFT;
+#endif

base = memblock_find_in_range(start, good_end, tables, PAGE_SIZE);
if (!base)
--
1.7.7.6


2012-06-15 11:11:13

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Tue, Jun 12, 2012 at 1:21 PM, Cong Wang <[email protected]> wrote:
>
> Takashi, could you help to test if the S4 regression is still
> there after this patch?

Hello,

Any comments?

I have tested this revert on my laptop, with running simple
"suspend and resume" test for several times, I didn't see any
problem.

Thanks!

2012-06-15 12:15:55

by Takashi Iwai

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

At Fri, 15 Jun 2012 19:11:12 +0800,
Cong Wang wrote:
>
> On Tue, Jun 12, 2012 at 1:21 PM, Cong Wang <[email protected]> wrote:
> >
> > Takashi, could you help to test if the S4 regression is still
> > there after this patch?
>
> Hello,
>
> Any comments?
>
> I have tested this revert on my laptop, with running simple
> "suspend and resume" test for several times, I didn't see any
> problem.

Sorry for the late response.

The problem happens only on certain laptops. I've tested the recent
IvyBridge laptops and these were OK with the reverted patch. But a
SandyBridge laptop, the one I tested in the last year, still hits the
S4 problem with the reverted kernel.

But, the recent kernels seem to have other S4 problems on this
machine, so it's not 100% clear whether it's triggered by that.
At least, it jumpts to the boot at S4 resume more frequently when the
patch is reverted.

So, I need to start from 2.6.32 again to see what regressions have
been introduced. It'll take time, since I'd have to S4 20 cycles
for reproducing the bug.


Takashi

2012-06-15 19:04:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Friday, June 15, 2012, Takashi Iwai wrote:
> At Fri, 15 Jun 2012 19:11:12 +0800,
> Cong Wang wrote:
> >
> > On Tue, Jun 12, 2012 at 1:21 PM, Cong Wang <[email protected]> wrote:
> > >
> > > Takashi, could you help to test if the S4 regression is still
> > > there after this patch?
> >
> > Hello,
> >
> > Any comments?
> >
> > I have tested this revert on my laptop, with running simple
> > "suspend and resume" test for several times, I didn't see any
> > problem.
>
> Sorry for the late response.
>
> The problem happens only on certain laptops. I've tested the recent
> IvyBridge laptops and these were OK with the reverted patch. But a
> SandyBridge laptop, the one I tested in the last year, still hits the
> S4 problem with the reverted kernel.
>
> But, the recent kernels seem to have other S4 problems on this
> machine, so it's not 100% clear whether it's triggered by that.
> At least, it jumpts to the boot at S4 resume more frequently when the
> patch is reverted.
>
> So, I need to start from 2.6.32 again to see what regressions have
> been introduced. It'll take time, since I'd have to S4 20 cycles
> for reproducing the bug.

Thanks for doing this, please let me know if there's any way I can help.

Rafael

2012-06-18 03:10:06

by Cong Wang

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Fri, Jun 15, 2012 at 8:15 PM, Takashi Iwai <[email protected]> wrote:
> At Fri, 15 Jun 2012 19:11:12 +0800,
> Cong Wang wrote:
>>
>> On Tue, Jun 12, 2012 at 1:21 PM, Cong Wang <[email protected]> wrote:
>> >
>> > Takashi, could you help to test if the S4 regression is still
>> > there after this patch?
>>
>> Hello,
>>
>> Any comments?
>>
>> I have tested this revert on my laptop, with running simple
>> "suspend and resume" test for several times, I didn't see any
>> problem.
>
> Sorry for the late response.
>
> The problem happens only on certain laptops.  I've tested the recent
> IvyBridge laptops and these were OK with the reverted patch.  But a
> SandyBridge laptop, the one I tested in the last year, still hits the
> S4 problem with the reverted kernel.

Yeah, but the crashkernel=512M regression seems to happen
on more machines. :)

>
> But, the recent kernels seem to have other S4 problems on this
> machine, so it's not 100% clear whether it's triggered by that.
> At least, it jumpts to the boot at S4 resume more frequently when the
> patch is reverted.
>
> So, I need to start from 2.6.32 again to see what regressions have
> been introduced.  It'll take time, since I'd have to S4 20 cycles
> for reproducing the bug.
>

Ok, thanks for your update!

2012-08-06 20:42:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Wed, Jul 25, 2012 at 09:19:08AM +0900, Takao Indoh wrote:
> (2012/07/25 0:55), Cong Wang wrote:
> >On Mon, 2012-07-23 at 20:22 +0900, Takao Indoh wrote:
> >>(2012/07/23 19:00), Dave Young wrote:
> >>>On 07/17/2012 11:15 AM, Takao Indoh wrote:
> >>>
> >>>>Hi Cong,
> >>>>
> >>>>When I tested kdump with 3.5.0-rc6 kernel, I found a problem of kdump
> >>>>kernel's panic in find_early_table_space().
> >>>>
> >>>>init_memory_mapping: [mem 0x00000000-0x36ffafff]
> >>>>Kernel panic - not syncing: Cannot find space for the kernel page tables
> >>>>Pid: 0, comm: swapper Not tainted 3.5.0-rc6 #17
> >>>>Call Trace:
> >>>> [<ffffffff8158549b>] panic+0xb8/0x1c8
> >>>> [<ffffffff8158565d>] ? printk+0x48/0x4a
> >>>> [<ffffffff8157304c>] init_memory_mapping+0x46c/0x530
> >>>> [<ffffffff818a73c7>] setup_arch+0x669/0xb0e
> >>>> [<ffffffff8158565d>] ? printk+0x48/0x4a
> >>>> [<ffffffff818a3a1f>] start_kernel+0x9b/0x34a
> >>>> [<ffffffff818a332d>] x86_64_start_reservations+0x131/0x136
> >>>> [<ffffffff818a341f>] x86_64_start_kernel+0xed/0xf4
> >>>>
> >>>>In find_early_table_space(), a kernel tries to find free area below 512M
> >>>>for pgtable using memblock_find_in_range, but it fails because kdump
> >>>>kernel does not have enough free space below 512M due to the memmap
> >>>>restriction. This is the memmap option specified against kdump kernel
> >>>>when crashkernel=128M.
> >>>>
> >>>>memmap=560K@64K memmap=130492K@770608K
> >>>>
> >>>>Only 560KB area is available and it is not sufficient for pgtable (it
> >>>>seems that about 1.8MB area is needed for pgtable). This problem is
> >>>>fixed by your revert patch. I hope this patch gets merged.
> >>>
> >>>
> >>>I can reproduce this issue as well, probably related to some x86 mm init
> >>>commits, this alloc failure does not happen with reverting below commits:
> >>>
> >>>bd2753b2dda7bb43c7468826de75f49c6a7e8965
> >>>722bc6b16771ed80871e1fd81c86d3627dda2ac8
> >>Yeah, my result of bisect is as follows and at first I thought the
> >>commit 722bc6 caused this regression.
> >>
> >>722bc6b16771ed80871e1fd81c86d3627dda2ac8 is the first bad commit
> >>commit 722bc6b16771ed80871e1fd81c86d3627dda2ac8
> >>Author: WANG Cong <[email protected]>
> >>Date: Mon Mar 5 15:05:13 2012 -0800
> >>
> >>IIUC, this commit just fixes a bug of counting pgtable entries. As the
> >>result, another problem came up to the surface. In the case of my
> >>machine(16GB memory), before applying 722bc6, find_early_table_space()
> >>requests about 12KB free area and it can be got from 560K@64K area
> >>luckily. I think the size find_early_table_space() requests was a bug.
> >>After the bug is fixed by the commit 722bc6, find_early_table_space()
> >>requires 1.8MB area and it fails as I wrote.
> >
> >Thanks for tracking this, Takao!
> >
> >I bet you are using x86_64 not x86 PAE? If so, could you try this patch
> >https://patchwork.kernel.org/patch/1195751/
> >? I already reviewed it.
>
> Great, I applied it and now kdump works. Thanks!

Did something happen on this patch. We definitely want to gain back the
capability to be able to reserve 512MB of kdump memory.

Thanks
Vivek

2012-08-06 21:56:22

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Mon, Aug 6, 2012 at 1:42 PM, Vivek Goyal <[email protected]> wrote:

>
> Did something happen on this patch. We definitely want to gain back the
> capability to be able to reserve 512MB of kdump memory.

Maybe Ingo and peter could push that to Linus.

Assume we have Acked-by from you, me and others.

Thanks

Yinghai

2012-08-11 17:58:10

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Wed Jul 25 12, Takao Indoh wrote:
> >Thanks for tracking this, Takao!
> >
> >I bet you are using x86_64 not x86 PAE? If so, could you try this patch
> >https://patchwork.kernel.org/patch/1195751/
> >? I already reviewed it.
>
> Great, I applied it and now kdump works. Thanks!
>
> Thanks,
> Takao Indoh
>

This patch from Jacob Shin solves the problem, and seems like it might
be a better solution.

[PATCH 2/5] x86: find_early_table_space based on memory ranges that
are being mapped

https://lkml.org/lkml/2012/8/9/540

2012-08-11 18:26:50

by Jerry Snitselaar

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Sat Aug 11 12, Jerry Snitselaar wrote:
> On Wed Jul 25 12, Takao Indoh wrote:
> > >Thanks for tracking this, Takao!
> > >
> > >I bet you are using x86_64 not x86 PAE? If so, could you try this patch
> > >https://patchwork.kernel.org/patch/1195751/
> > >? I already reviewed it.
> >
> > Great, I applied it and now kdump works. Thanks!
> >
> > Thanks,
> > Takao Indoh
> >
>
> This patch from Jacob Shin solves the problem, and seems like it might
> be a better solution.
>
> [PATCH 2/5] x86: find_early_table_space based on memory ranges that
> are being mapped
>
> https://lkml.org/lkml/2012/8/9/540
>
Actually, apply that series of 5 patches.

2012-08-11 18:35:16

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On 08/11/2012 11:26 AM, Jerry Snitselaar wrote:
>>
>> This patch from Jacob Shin solves the problem, and seems like it might
>> be a better solution.
>>
>> [PATCH 2/5] x86: find_early_table_space based on memory ranges that
>> are being mapped
>>
>> https://lkml.org/lkml/2012/8/9/540
>>
> Actually, apply that series of 5 patches.
>

I was hoping Tejun would comment on it, but I think I'll pull it into -tip.

However, the real question is what we should do for -stable; applying
the full patch series seems a big aggressive for that. On the other
hand, if it really is The Right Thing then perhaps we should do so anyway.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-08-11 18:39:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On 08/11/2012 11:34 AM, H. Peter Anvin wrote:
> On 08/11/2012 11:26 AM, Jerry Snitselaar wrote:
>>>
>>> This patch from Jacob Shin solves the problem, and seems like it might
>>> be a better solution.
>>>
>>> [PATCH 2/5] x86: find_early_table_space based on memory ranges that
>>> are being mapped
>>>
>>> https://lkml.org/lkml/2012/8/9/540
>>>
>> Actually, apply that series of 5 patches.
>>
>
> I was hoping Tejun would comment on it, but I think I'll pull it into -tip.
>
> However, the real question is what we should do for -stable; applying
> the full patch series seems a big aggressive for that. On the other
> hand, if it really is The Right Thing then perhaps we should do so anyway.
>

Ah, right... still waiting for a rev of the patch to address Yinghai's
legitimate request for minor code restructuring. Other than that, the
patchset is really The Right Thing.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-08-11 19:33:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] x86: revert "x86: Fix S4 regression"

On Sat, Aug 11, 2012 at 11:34:22AM -0700, H. Peter Anvin wrote:
> On 08/11/2012 11:26 AM, Jerry Snitselaar wrote:
> >>
> >>This patch from Jacob Shin solves the problem, and seems like it might
> >>be a better solution.
> >>
> >>[PATCH 2/5] x86: find_early_table_space based on memory ranges that
> >>are being mapped
> >>
> >>https://lkml.org/lkml/2012/8/9/540
> >>
> >Actually, apply that series of 5 patches.
> >
>
> I was hoping Tejun would comment on it, but I think I'll pull it into -tip.

Wasn't cc'd. Will take a look.

Thanks.

--
tejun