On Sat 2017-12-09 14:47:41, Linus Torvalds wrote:
> On Dec 9, 2017 14:01, "Pavel Machek" <[email protected]> wrote:
>
>
> Strange. I was at 4.15-rc1, and suspend worked there (thinkpad x60,
> 32-bit). It was broken in -next. I updated to current mainline (
> 4ded3bec65a07343258ed8fd9d46483f032d866f ) and suspend is broken.
>
Can you do something about html emails? Quoting them doesn't work too well.
> Any chance to bisect it? This doesn't sound like the other problem
> we had.
Let me try...
v4.15-rc2: suspends/resumes ok.
4ded3be: hangs on resume.
Given that between those, there was supposed "fix" for suspend, I
believe I should try reverting that one first. .. if someone can tell
me commit id, that would help.
And yes, if everything else fails, I can probably bisect.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <[email protected]> wrote:
>
> Can you do something about html emails? Quoting them doesn't work too well.
Yeah, and they don't show up onlkml either because of rules. I try to
avoid them, but have been more on mobile for various reasons lately
than usual. That should be over and done with after today, though.
>> Any chance to bisect it? This doesn't sound like the other problem
>> we had.
>
> Let me try...
>
> v4.15-rc2: suspends/resumes ok.
> 4ded3be: hangs on resume.
>
> Given that between those, there was supposed "fix" for suspend, I
> believe I should try reverting that one first. .. if someone can tell
> me commit id, that would help.
The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
bugs in __restore_processor_context()") so you can certainly see if it
works before that (or just reverting it).
But there are also a few other x86 low-level things there, and that
fix really looks very safe, so I'd almost expect something else to
have triggered your problem. There's less than 500 commits in that
range you have, so a few bisections should narrow it down a lot.
Linus
On Sun 2017-12-10 08:37:56, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 8:23 AM, Pavel Machek <[email protected]> wrote:
> >
> > Can you do something about html emails? Quoting them doesn't work too well.
>
> Yeah, and they don't show up onlkml either because of rules. I try to
> avoid them, but have been more on mobile for various reasons lately
> than usual. That should be over and done with after today, though.
>
> >> Any chance to bisect it? This doesn't sound like the other problem
> >> we had.
> >
> > Let me try...
> >
> > v4.15-rc2: suspends/resumes ok.
> > 4ded3be: hangs on resume.
> >
> > Given that between those, there was supposed "fix" for suspend, I
> > believe I should try reverting that one first. .. if someone can tell
> > me commit id, that would help.
>
> The fix in there should be 5b06bbcfc2c6 ("x86/power: Fix some ordering
> bugs in __restore_processor_context()") so you can certainly see if it
> works before that (or just reverting it).
Revert is easier.
> But there are also a few other x86 low-level things there, and that
> fix really looks very safe, so I'd almost expect something else to
> have triggered your problem. There's less than 500 commits in that
> range you have, so a few bisections should narrow it down a lot.
No, that commit does _look_ pretty suspect to me...
Confirmed, revert fixes it. You see how it moves fix_processor_context
around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
machines exist? Aha.
Which brings me to .. various people do automated testing of
kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
boot and suspend would be very nice. The last item is not hard, either:
sudo rtcwake -l -m mem -s 5
...should take 10 seconds or so.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <[email protected]> wrote:
>
> Confirmed, revert fixes it. You see how it moves fix_processor_context
> around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> machines exist? Aha.
Yeah, people do.
Andy?
> Which brings me to .. various people do automated testing of
> kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit for
> boot and suspend would be very nice. The last item is not hard, either:
>
> sudo rtcwake -l -m mem -s 5
>
> ...should take 10 seconds or so.
I'm told 0day does *some* suspend/resume testing, but I think it's
pretty limited, partly because the kinds of machines it primarily
works on don't really support suspend/resume at all. I'm also not sure
just how many of those machines are 32-bit at all..
But I'm adding Zhang Rui to the cc, to see if my recollection is right.
Because you're right, more suspend/resume automated testing would be
good to have. And yes, people test mainly 64-bit these days.
Also, I'm not even sure what the 0day rules are for just plain
mainline. I don't tend to see a lot of breakage reports, even though
I'd expect to. This came in from the x86 trees (and those do their own
tests too, but probably not suspend/resume either), but it hit my tree
fairly soon after going into the x86 -tip trees.
Linus
On Sun 2017-12-10 12:30:52, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <[email protected]> wrote:
> >
> > Confirmed, revert fixes it. You see how it moves fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
>
> Yeah, people do.
>
> Andy?
For the record... this should fix it. Tested on x60. More tests pending.
Signed-off-by: Pavel Machek <[email protected]>
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..d59f05f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
write_cr0(ctxt->cr0);
/*
- * now restore the descriptor tables to their proper values
- * ltr is done i fix_processor_context().
+ * Now restore the descriptor tables to their proper values
+ * ltr is done in fix_processor_context().
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
@@ -235,8 +235,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrl(MSR_GS_BASE, ctxt->gs_base);
#endif
- fix_processor_context();
-
/*
* Restore segment registers. This happens after restoring the GDT
* and LDT, which happen in fix_processor_context().
@@ -252,8 +250,12 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
*/
if (boot_cpu_has(X86_FEATURE_SEP))
enable_sep_cpu();
+
+ fix_processor_context();
#else
/* CONFIG_X86_64 */
+ fix_processor_context();
+
asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <[email protected]> wrote:
>
> For the record... this should fix it. Tested on x60. More tests pending.
This can't be right.
At the very least, now the comment is wrong. And the comment does seem
relevant for 32-bit too:
> - fix_processor_context();
> -
> /*
> * Restore segment registers. This happens after restoring the GDT
> * and LDT, which happen in fix_processor_context().
Notice? You've moved down the 32-bit fix_processor_context() call to
after the loadsegment() calls, which smells wrong.
That said, this *all* smells wrong. Why is there a special
fix_processor_context() function at all with different 32-bit and
64-bit behavior? This code is all written to be maximally confusing.
I think this could do with some re-org to make it more logical. That
"some random things done in fix_processor_context(), other random
things done directly in __restore_processor_state()" makes no sense at
all to me. There's no logic to what is done where.
Linus
On Sun 2017-12-10 13:28:50, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 12:43 PM, Pavel Machek <[email protected]> wrote:
> >
> > For the record... this should fix it. Tested on x60. More tests pending.
>
> This can't be right.
>
> At the very least, now the comment is wrong. And the comment does seem
> relevant for 32-bit too:
Well, take a look at orignal patch. I'm reverting 32-bit code to
v4.15-rc1 version, while keeping 64-bit code at v4.15-rc3
version. Yes, my brain hurts from looking at the code :-(.
In the meantime, I did short test on 64-bit machine. No ill effect observed.
Hmm. Aha. Yes, the comment is wrong... as it was in wrong in -rc1.
> > - fix_processor_context();
> > -
> > /*
> > * Restore segment registers. This happens after restoring the GDT
> > * and LDT, which happen in fix_processor_context().
>
> Notice? You've moved down the 32-bit fix_processor_context() call to
> after the loadsegment() calls, which smells wrong.
Yeah, I did. There's where it was in v4.15-rc1, and that's what ws
working for me.
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
>
> I think this could do with some re-org to make it more logical. That
> "some random things done in fix_processor_context(), other random
> things done directly in __restore_processor_state()" makes no sense at
> all to me. There's no logic to what is done where.
I have to agree.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
(unintentionally?) reordered stuff with respect to
fix_processor_context() on 32-bit and 64-bit. We undo that change on
32-bit.
While we are at it, fix a comment.
Signed-off-by: Pavel Machek <[email protected]>
Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 5191de1..af7b613 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
write_cr0(ctxt->cr0);
/*
- * now restore the descriptor tables to their proper values
- * ltr is done i fix_processor_context().
+ * Now restore the descriptor tables to their proper values
+ * ltr is done in fix_processor_context().
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
@@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrl(MSR_GS_BASE, ctxt->gs_base);
#endif
- fix_processor_context();
-
+#ifdef CONFIG_X86_32
/*
- * Restore segment registers. This happens after restoring the GDT
- * and LDT, which happen in fix_processor_context().
+ * Restore segment registers.
*/
-#ifdef CONFIG_X86_32
+
loadsegment(es, ctxt->es);
loadsegment(fs, ctxt->fs);
loadsegment(gs, ctxt->gs);
@@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
*/
if (boot_cpu_has(X86_FEATURE_SEP))
enable_sep_cpu();
+
+ fix_processor_context();
#else
/* CONFIG_X86_64 */
+ /*
+ * Restore segment registers. This happens after restoring the GDT
+ * and LDT, which happen in fix_processor_context().
+ */
+
+ fix_processor_context();
+
asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
>
>
> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> (unintentionally?) reordered stuff with respect to
> fix_processor_context() on 32-bit and 64-bit. We undo that change on
> 32-bit.
>
Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
> While we are at it, fix a comment.
>
> Signed-off-by: Pavel Machek <[email protected]>
> Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
>
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 5191de1..af7b613 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> write_cr0(ctxt->cr0);
>
> /*
> - * now restore the descriptor tables to their proper values
> - * ltr is done i fix_processor_context().
> + * Now restore the descriptor tables to their proper values
> + * ltr is done in fix_processor_context().
> */
> #ifdef CONFIG_X86_32
> load_idt(&ctxt->idt);
> @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> #endif
>
> - fix_processor_context();
> -
> +#ifdef CONFIG_X86_32
> /*
> - * Restore segment registers. This happens after restoring the GDT
> - * and LDT, which happen in fix_processor_context().
> + * Restore segment registers.
> */
> -#ifdef CONFIG_X86_32
> +
> loadsegment(es, ctxt->es);
> loadsegment(fs, ctxt->fs);
> loadsegment(gs, ctxt->gs);
> @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> */
> if (boot_cpu_has(X86_FEATURE_SEP))
> enable_sep_cpu();
> +
> + fix_processor_context();
> #else
> /* CONFIG_X86_64 */
> + /*
> + * Restore segment registers. This happens after restoring the GDT
> + * and LDT, which happen in fix_processor_context().
> + */
> +
> + fix_processor_context();
> +
> asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
> asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
> asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
No, I can't, sorry.
> I'm guessing that the real issue is that 32-bit needs %fs restored
> early for TLS.
Maybe. I can test patches...
I don't think it would be good idea to revert
5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
regression in -rc2, I believe we should fix the regression now, and
then we can try to provide cleaner solution.
As Linus noted, the code is quite "interesting".
Pavel
>
> > While we are at it, fix a comment.
> >
> > Signed-off-by: Pavel Machek <[email protected]>
> > Fixes: 5b06bbcfc2c621da3009da8decb7511500c293ed
> >
> > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> > index 5191de1..af7b613 100644
> > --- a/arch/x86/power/cpu.c
> > +++ b/arch/x86/power/cpu.c
> > @@ -216,8 +216,8 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > write_cr0(ctxt->cr0);
> >
> > /*
> > - * now restore the descriptor tables to their proper values
> > - * ltr is done i fix_processor_context().
> > + * Now restore the descriptor tables to their proper values
> > + * ltr is done in fix_processor_context().
> > */
> > #ifdef CONFIG_X86_32
> > load_idt(&ctxt->idt);
> > @@ -235,13 +235,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > wrmsrl(MSR_GS_BASE, ctxt->gs_base);
> > #endif
> >
> > - fix_processor_context();
> > -
> > +#ifdef CONFIG_X86_32
> > /*
> > - * Restore segment registers. This happens after restoring the GDT
> > - * and LDT, which happen in fix_processor_context().
> > + * Restore segment registers.
> > */
> > -#ifdef CONFIG_X86_32
> > +
> > loadsegment(es, ctxt->es);
> > loadsegment(fs, ctxt->fs);
> > loadsegment(gs, ctxt->gs);
> > @@ -252,8 +250,17 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> > */
> > if (boot_cpu_has(X86_FEATURE_SEP))
> > enable_sep_cpu();
> > +
> > + fix_processor_context();
> > #else
> > /* CONFIG_X86_64 */
> > + /*
> > + * Restore segment registers. This happens after restoring the GDT
> > + * and LDT, which happen in fix_processor_context().
> > + */
> > +
> > + fix_processor_context();
> > +
> > asm volatile ("movw %0, %%ds" :: "r" (ctxt->ds));
> > asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
> > asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
> >
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 12/11/2017 12:20 AM, Pavel Machek wrote:
> On Sun 2017-12-10 13:58:23, Andy Lutomirski wrote:
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>
> No, I can't, sorry.
>
>> I'm guessing that the real issue is that 32-bit needs %fs restored
>> early for TLS.
>
> Maybe. I can test patches...
>
> I don't think it would be good idea to revert
> 5b06bbcfc2c621da3009da8decb7511500c293ed, but since it introduced
> regression in -rc2, I believe we should fix the regression now, and
> then we can try to provide cleaner solution.
>
I can confirm Pavel's findings. Commit 5b06bbcfc2c6 ("x86/power: Fix
some ordering bugs in __restore_processor_context()") broke the
suspend/resume on 32-bit kernel.
v4.15-rc3 works either by reverting the commit or by Pavel's patch.
Fortunately Pavel's patch still keeps the 64-bit suspend/resume ok.
--
Jarkko
On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <[email protected]> wrote:
> >
> >
> > Confirmed, revert fixes it. You see how it moves
> > fix_processor_context
> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
> > machines exist? Aha.
> Yeah, people do.
>
> Andy?
>
> >
> > Which brings me to .. various people do automated testing of
> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
> > for
> > boot and suspend would be very nice. The last item is not hard,
> > either:
> >
> > sudo rtcwake -l -m mem -s 5
> >
> > ...should take 10 seconds or so.
> I'm told 0day does *some* suspend/resume testing, but I think it's
> pretty limited, partly because the kinds of machines it primarily
> works on don't really support suspend/resume at all.
currently, we're running suspend test on 1 platform only, with 64 bit
kernel. suspend test will be enabled on more platforms (laptops) in
next two weeks.
I will check why it does not find the first regression introduced by
ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
native_load_gs_index()").
> I'm also not sure
> just how many of those machines are 32-bit at all..
for this, I suppose it can be reproduced if we use 32-bit kernel and
rootfs, right? Then it's easier to enable this in 0Day.
thanks,
rui
>
> But I'm adding Zhang Rui to the cc, to see if my recollection is
> right.
>
> Because you're right, more suspend/resume automated testing would be
> good to have. And yes, people test mainly 64-bit these days.
>
> Also, I'm not even sure what the 0day rules are for just plain
> mainline. I don't tend to see a lot of breakage reports, even though
> I'd expect to. This came in from the x86 trees (and those do their
> own
> tests too, but probably not suspend/resume either), but it hit my
> tree
> fairly soon after going into the x86 -tip trees.
>
> Linus
On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
I *think* you are right.
Anyway, that should be easy enough to verify.
Pavel, can you please check if the below change works too?
---
arch/x86/power/cpu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-pm/arch/x86/power/cpu.c
===================================================================
--- linux-pm.orig/arch/x86/power/cpu.c
+++ linux-pm/arch/x86/power/cpu.c
@@ -221,6 +221,8 @@ static void notrace __restore_processor_
*/
#ifdef CONFIG_X86_32
load_idt(&ctxt->idt);
+
+ loadsegment(fs, ctxt->fs);
#else
/* CONFIG_X86_64 */
load_idt((const struct desc_ptr *)&ctxt->idt_limit);
@@ -243,7 +245,6 @@ static void notrace __restore_processor_
*/
#ifdef CONFIG_X86_32
loadsegment(es, ctxt->es);
- loadsegment(fs, ctxt->fs);
loadsegment(gs, ctxt->gs);
loadsegment(ss, ctxt->ss);
On Monday, December 11, 2017 3:22:39 PM CET Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
> >
> > > On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
> > >
> > >
> > > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > > (unintentionally?) reordered stuff with respect to
> > > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > > 32-bit.
> > >
> >
> > Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
> >
> > I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
Hmm. Don't we need to restore GS on 32-bit before doing the per_cpu() thing
in fix_processor_context()?
On 12/11/2017 04:22 PM, Rafael J. Wysocki wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>>> On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
>>>
>>>
>>> After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>>> (unintentionally?) reordered stuff with respect to
>>> fix_processor_context() on 32-bit and 64-bit. We undo that change on
>>> 32-bit.
>>>
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
>
> Anyway, that should be easy enough to verify.
>
> Pavel, can you please check if the below change works too?
>
> ---
> arch/x86/power/cpu.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
Works here on my test machine.
Tested-by: Jarkko Nikula <[email protected]>
* Andy Lutomirski <[email protected]> wrote:
>
>
> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
> >
> >
> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
> > (unintentionally?) reordered stuff with respect to
> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
> > 32-bit.
> >
>
> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>
> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
Does some early percpu primitive need GS as well perhaps?
Might be safest to restore both FS and GS early.
Thanks,
Ingo
On Mon, Dec 11, 2017 at 7:13 AM, Ingo Molnar <[email protected]> wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>>
>>
>> > On Dec 10, 2017, at 1:38 PM, Pavel Machek <[email protected]> wrote:
>> >
>> >
>> > After 4.15-rc2, suspend stopped working on Thinkpad X60. 5b06bbc
>> > (unintentionally?) reordered stuff with respect to
>> > fix_processor_context() on 32-bit and 64-bit. We undo that change on
>> > 32-bit.
>> >
>>
>> Can you explain what was wrong with the reordering? Your patch certainly *looks* incorrect.
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> Does some early percpu primitive need GS as well perhaps?
>
> Might be safest to restore both FS and GS early.
>
fs needs to be restored early for TLS. gs needs to be restored early,
maybe, if !X86_32_LAZY_GS -- it's used for stack-protector. If
X86_32_LAZY_GS, gs must *not* be restored early.
On Mon, Dec 11, 2017 at 6:09 AM, Zhang Rui <[email protected]> wrote:
> On Sun, 2017-12-10 at 12:30 -0800, Linus Torvalds wrote:
>> On Sun, Dec 10, 2017 at 10:56 AM, Pavel Machek <[email protected]> wrote:
>> >
>> >
>> > Confirmed, revert fixes it. You see how it moves
>> > fix_processor_context
>> > around #ifdef CONFIG_X86_32 block? And how people forget 32-bit
>> > machines exist? Aha.
>> Yeah, people do.
>>
>> Andy?
>>
>> >
>> > Which brings me to .. various people do automated testing of
>> > kernel. Testing 32-bit kernel for boot, and both 32-bit and 64-bit
>> > for
>> > boot and suspend would be very nice. The last item is not hard,
>> > either:
>> >
>> > sudo rtcwake -l -m mem -s 5
>> >
>> > ...should take 10 seconds or so.
>> I'm told 0day does *some* suspend/resume testing, but I think it's
>> pretty limited, partly because the kinds of machines it primarily
>> works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
>
> I will check why it does not find the first regression introduced by
> ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to
> native_load_gs_index()").
>
>> I'm also not sure
>> just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
>
Yes.
The 64-bit problem should also be reproducible with rtcwake even in a vm.
Also, on this topic, could make run_tests in
tools/testing/selftests/x86 be added to the rotation as well? The
testing dir should match the kernel being tested IMO.
> thanks,
> rui
>>
>> But I'm adding Zhang Rui to the cc, to see if my recollection is
>> right.
>>
>> Because you're right, more suspend/resume automated testing would be
>> good to have. And yes, people test mainly 64-bit these days.
>>
>> Also, I'm not even sure what the 0day rules are for just plain
>> mainline. I don't tend to see a lot of breakage reports, even though
>> I'd expect to. This came in from the x86 trees (and those do their
>> own
>> tests too, but probably not suspend/resume either), but it hit my
>> tree
>> fairly soon after going into the x86 -tip trees.
>>
>> Linus
On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>
>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>
> I *think* you are right.
>
> Anyway, that should be easy enough to verify.
>
> Pavel, can you please check if the below change works too?
So Jarkko confirmed this works for him, but the more I look at this
crap, the less I like it.
Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
values in the kernel, with %fs being __KERNEL_PERCPU, and the others
being __USER_DS?
Nothing else can possibly be valid, as far as I can tell.
I think we actually leave the user-space percpu segment in %gs (or the
stack canary base), so that one we should actually save/restore, but
I'm getting the feeling that we should just reset the other segment
registers to known values on 32-bit.
Also, why does the 32-bit code do
loadsegment(es, ctxt->es);
but the 64-bit code does
asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
all within the 64-bit case.
In particular, note how we reload the %gs segment in between the two -
wouldn't that mess with the currently active gs base if %gs can be
non-zero?
Christ, what a mess.
So I think that whole sequence is garbage. It has been written as some
kind of "save and restore registers", but that's not what it really
then does - or what it should do.
It should make sure to restore a sane kernel state, not some random
register state.
And the 32-bit and 64-bit code really should strive to be at least
_sanely_ different, not this randomly and insanely different mess.
But yes, Rafael's patch looks like the minimal one-liner. But I think
we should do the %gs load early too for the 32-bit stack canary case,
kind of like we need to do %fs for percpu base.
Linus
On Mon, Dec 11, 2017 at 10:31 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Dec 11, 2017 at 6:22 AM, Rafael J. Wysocki <[email protected]> wrote:
>> On Sunday, December 10, 2017 10:58:23 PM CET Andy Lutomirski wrote:
>>>
>>> I'm guessing that the real issue is that 32-bit needs %fs restored early for TLS.
>>
>> I *think* you are right.
>>
>> Anyway, that should be easy enough to verify.
>>
>> Pavel, can you please check if the below change works too?
>
> So Jarkko confirmed this works for him, but the more I look at this
> crap, the less I like it.
>
> Why do we save fs/ds/es/ss at all on x86-32? Don't they all have fixed
> values in the kernel, with %fs being __KERNEL_PERCPU, and the others
> being __USER_DS?
>
> Nothing else can possibly be valid, as far as I can tell.
>
> I think we actually leave the user-space percpu segment in %gs (or the
> stack canary base), so that one we should actually save/restore, but
> I'm getting the feeling that we should just reset the other segment
> registers to known values on 32-bit.
>
> Also, why does the 32-bit code do
>
> loadsegment(es, ctxt->es);
>
> but the 64-bit code does
>
> asm volatile ("movw %0, %%es" :: "r" (ctxt->es));
>
> And look at that confusion between MSR_GS_BASE and MSR_KERNEL_GS_BASE
> all within the 64-bit case.
>
> In particular, note how we reload the %gs segment in between the two -
> wouldn't that mess with the currently active gs base if %gs can be
> non-zero?
>
> Christ, what a mess.
>
> So I think that whole sequence is garbage. It has been written as some
> kind of "save and restore registers", but that's not what it really
> then does - or what it should do.
>
> It should make sure to restore a sane kernel state, not some random
> register state.
>
> And the 32-bit and 64-bit code really should strive to be at least
> _sanely_ different, not this randomly and insanely different mess.
>
> But yes, Rafael's patch looks like the minimal one-liner. But I think
> we should do the %gs load early too for the 32-bit stack canary case,
> kind of like we need to do %fs for percpu base.
I'll try to get to this in a day or so -- is that okay? Or should we
do some trivial fix/revert and fix it for real next time around?
On Mon, Dec 11, 2017 at 10:41 AM, Andy Lutomirski <[email protected]> wrote:
>
> I'll try to get to this in a day or so -- is that okay? Or should we
> do some trivial fix/revert and fix it for real next time around?
I don't think we want some trivial fix/revert just to keep it working.
This code is too fragile as-is, and I think that "make it work" is
more than reverting. You did fix real issues on x86-64 with odd
segment use, for example.
Linus
Hi!
> > > ...should take 10 seconds or so.
> > I'm told 0day does *some* suspend/resume testing, but I think it's
> > pretty limited, partly because the kinds of machines it primarily
> > works on don't really support suspend/resume at all.
>
> currently, we're running suspend test on 1 platform only, with 64 bit
> kernel. suspend test will be enabled on more platforms (laptops) in
> next two weeks.
Thanks!
> > I'm also not sure
> > just how many of those machines are 32-bit at all..
>
> for this, I suppose it can be reproduced if we use 32-bit kernel and
> rootfs, right? Then it's easier to enable this in 0Day.
Yes, Intel cpus are pretty good at backwards compatibility, and most
problems are not subtle at all. So yes, 32-bit kernel / rootfs on
recent machine should be good for testing.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
<[email protected]> wrote:
>
> That said, this *all* smells wrong. Why is there a special
> fix_processor_context() function at all with different 32-bit and
> 64-bit behavior? This code is all written to be maximally confusing.
Hmm. Looking a bit more at this, I think it should be solved by:
- load the original read-write GDT early, along with the IDT.
We have already saved it off in __save_processor_state(), and it may
have already gotten loaded very early in at least some of the paths,
but it definitely hasn't gotten reloaded in all the paths (think
"suspend/resume testing" etc).
- add the LDT descriptor to the save area too, exactly like we
already have IDT/GDT.
Then, we can do "load_ldt()" early (along with IDT and GDT).
- now we can just load all the segments early, and get the percpu and
stack canary stuff without any special cases
- do NOT use "load_gs_index()", which does that swapgs dance (twice!)
and plays with interrupt state. Just load the segment register, and
then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
need for the swapgs dance.
- now that we have a fully working system, then the
"fix_processor_context()" code can do the "fancy" stuff like setting
up the RO fixmap GDT and re-initializing the TLB state. Those want
percpu stuff.
In other words, why not try to organize this so that we do a simple
and fairly mindless restore of the low-level state first? Avoid all
the "system is halfway up" crud.
Would that work for people? Andy?
Linus
On Tue, Dec 12, 2017 at 9:27 AM, Linus Torvalds
<[email protected]> wrote:
> On Sun, Dec 10, 2017 at 1:28 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> That said, this *all* smells wrong. Why is there a special
>> fix_processor_context() function at all with different 32-bit and
>> 64-bit behavior? This code is all written to be maximally confusing.
>
> Hmm. Looking a bit more at this, I think it should be solved by:
>
> - load the original read-write GDT early, along with the IDT.
>
> We have already saved it off in __save_processor_state(), and it may
> have already gotten loaded very early in at least some of the paths,
> but it definitely hasn't gotten reloaded in all the paths (think
> "suspend/resume testing" etc).
>
> - add the LDT descriptor to the save area too, exactly like we
> already have IDT/GDT.
>
> Then, we can do "load_ldt()" early (along with IDT and GDT).
>
> - now we can just load all the segments early, and get the percpu and
> stack canary stuff without any special cases
>
> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
> and plays with interrupt state. Just load the segment register, and
> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
> need for the swapgs dance.
Using what helper? On x86_64, it can fault, and IIRC we explicitly
don't allow loadsegment(gs, ...).
>
> - now that we have a fully working system, then the
> "fix_processor_context()" code can do the "fancy" stuff like setting
> up the RO fixmap GDT and re-initializing the TLB state. Those want
> percpu stuff.
>
> In other words, why not try to organize this so that we do a simple
> and fairly mindless restore of the low-level state first? Avoid all
> the "system is halfway up" crud.
>
> Would that work for people? Andy?
Other than the above, more or less.
But we should really do all the user segments last. They're not at
all needed for normal kernel execution, so I think they should be the
very last part.
I'll try to get to this today.
On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <[email protected]> wrote:
>>
>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>> and plays with interrupt state. Just load the segment register, and
>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>> need for the swapgs dance.
>
> Using what helper? On x86_64, it can fault, and IIRC we explicitly
> don't allow loadsegment(gs, ...).
Just do the loadsegment() thing. The fact that we don't have a gs
version of it is legacy - to catch bad users. It shouldn't stop us
from having good users.
That said - can it really fault? Because if it can, then why can't %fs
fault? And on x86-64, we just do
asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
and don't actually use 'loadsegment()' for _any_ of the segments. We
only do the fault protection on 32-bit.
In fact, we really should try to avoid taking faults here anyway,
shouldn't we? We haven't loaded enough of the context yet.
Hmm.
Maybe we should load only the fixed kernel segments at this point, and
then do all the loadsegment() of gs/fs in the later phase when we're
all set up.
THERE we can do the swapgs dance with interrupt tracing etc, because
*there* we actually are fully set up. I guess that means reloading the
FS/GS base MSR's,
Linus
On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <[email protected]> wrote:
>>>
>>> - do NOT use "load_gs_index()", which does that swapgs dance (twice!)
>>> and plays with interrupt state. Just load the segment register, and
>>> then do the wrmsrl() of the {FS,GS,KERNEL_GS}_BASE values. There is no
>>> need for the swapgs dance.
>>
>> Using what helper? On x86_64, it can fault, and IIRC we explicitly
>> don't allow loadsegment(gs, ...).
>
> Just do the loadsegment() thing. The fact that we don't have a gs
> version of it is legacy - to catch bad users. It shouldn't stop us
> from having good users.
>
> That said - can it really fault? Because if it can, then why can't %fs
> fault? And on x86-64, we just do
>
> asm volatile ("movw %0, %%fs" :: "r" (ctxt->fs));
>
> and don't actually use 'loadsegment()' for _any_ of the segments. We
> only do the fault protection on 32-bit.
>
> In fact, we really should try to avoid taking faults here anyway,
> shouldn't we? We haven't loaded enough of the context yet.
>
> Hmm.
>
> Maybe we should load only the fixed kernel segments at this point, and
> then do all the loadsegment() of gs/fs in the later phase when we're
> all set up.
>
> THERE we can do the swapgs dance with interrupt tracing etc, because
> *there* we actually are fully set up. I guess that means reloading the
> FS/GS base MSR's,
Like this?
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
I've barely tested it. It suspended and resumed once in a 64-bit VM.
It compiles on 32-bit.
(That link might not work for a little bit. I'm not sure what's up.)
On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <[email protected]> wrote:
>
> (That link might not work for a little bit. I'm not sure what's up.)
I think your link is just bogus.
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
works.
Anyway, the code looks much better to me.
Whether it _works_ is another matter, of course.
Linus
On Tue, Dec 12, 2017 at 2:33 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Dec 12, 2017 at 2:10 PM, Andy Lutomirski <[email protected]> wrote:
>>
>> (That link might not work for a little bit. I'm not sure what's up.)
>
> I think your link is just bogus.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> works.
If I click "commit" on that page, I get my link, and it's still busted.
>
> Anyway, the code looks much better to me.
>
> Whether it _works_ is another matter, of course.
>
> Linus
On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <[email protected]> wrote:
> Like this?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
>
> I've barely tested it. It suspended and resumed once in a 64-bit VM.
> It compiles on 32-bit.
>
I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both
64-bit and 32-bit work.
Tested-by: Jarkko Nikula <[email protected]>
* Jarkko Nikula <[email protected]> wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
> > On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
> > <[email protected]> wrote:
> > > On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <[email protected]> wrote:
> > Like this?
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
> >
> > I've barely tested it. It suspended and resumed once in a 64-bit VM.
> > It compiles on 32-bit.
> >
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula <[email protected]>
Great, thanks for the testing!
Andy, mind sending the final fix with a changelog and the Reported-by/Tested-by
tags, etc?
Thanks,
Ingo
On Wed, Dec 13, 2017 at 3:16 AM, Jarkko Nikula
<[email protected]> wrote:
> On 12/13/2017 12:10 AM, Andy Lutomirski wrote:
>>
>> On Tue, Dec 12, 2017 at 10:36 AM, Linus Torvalds
>> <[email protected]> wrote:
>>>
>>> On Tue, Dec 12, 2017 at 10:05 AM, Andy Lutomirski <[email protected]>
>>> wrote:
>>
>> Like this?
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=cb855aa9679a15adbe43732f5854270de2b35856
>>
>> I've barely tested it. It suspended and resumed once in a 64-bit VM.
>> It compiles on 32-bit.
>>
> I tested with "rtcwake -s 5 -m mem" 2-3 times on a real machine. Both 64-bit
> and 32-bit work.
>
> Tested-by: Jarkko Nikula <[email protected]>
Thanks!
I've split the patch into three and put your Tested-By on all of them,
since the end state is exactly the same as what you tested. I'll
email them out once I test them myself :)
Hi!
> > But yes, Rafael's patch looks like the minimal one-liner. But I think
> > we should do the %gs load early too for the 32-bit stack canary case,
> > kind of like we need to do %fs for percpu base.
>
> I'll try to get to this in a day or so -- is that okay? Or should we
> do some trivial fix/revert and fix it for real next time around?
I can test the patch....
But given this is already "regression fix for x86-64 caused regression
on x86-32", I really believe we should merge trivial fix now, and do
the cleanups / nicer fixes sometime later?
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <[email protected]> wrote:
>
> But given this is already "regression fix for x86-64 caused regression
> on x86-32", I really believe we should merge trivial fix now, and do
> the cleanups / nicer fixes sometime later?
The fix patch was already posted, but in another thread (confusingly
with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
resume from ACPI S3").
Here:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
but it hasn't actually gotten to me yet (I think Luto made a new
version where he split it up into three smaller patches, and I'm
assuming it's in one of the -tip trees heading for me asap).
All the x86 people have been insanely busy with the crazy page table
isolation patches, that probably is slowing down things. But I was
expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
It's my busiest day of the week because everybody wants to get their
work out before the weekend).
Linus
On Thu, Dec 14, 2017 at 12:47 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <[email protected]> wrote:
>>
>> But given this is already "regression fix for x86-64 caused regression
>> on x86-32", I really believe we should merge trivial fix now, and do
>> the cleanups / nicer fixes sometime later?
>
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
>
> Here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).
>
> All the x86 people have been insanely busy with the crazy page table
> isolation patches, that probably is slowing down things. But I was
> expecting to get the -tip tree pulls tomorrow (.. because.. Friday.
> It's my busiest day of the week because everybody wants to get their
> work out before the weekend).
Nah, it's that I was busy with the stupid PTI stuff, and the original
version of the patch didn't compile on some configurations. I've
tested the version I just sent for suspend-to-ram and suspend-to-disk
on 32-bit and 64-bit in a couple of configurations.
On Thu 2017-12-14 12:47:56, Linus Torvalds wrote:
> On Thu, Dec 14, 2017 at 12:38 PM, Pavel Machek <[email protected]> wrote:
> >
> > But given this is already "regression fix for x86-64 caused regression
> > on x86-32", I really believe we should merge trivial fix now, and do
> > the cleanups / nicer fixes sometime later?
>
> The fix patch was already posted, but in another thread (confusingly
> with _almost_ the same subject: "Re: Linux 4.15-rc2: Regression in
> resume from ACPI S3").
>
> Here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes
>
> but it hasn't actually gotten to me yet (I think Luto made a new
> version where he split it up into three smaller patches, and I'm
> assuming it's in one of the -tip trees heading for me asap).
Ok.. It did not apply cleanly. ... but I see fresh patches in my
inbox, let me test those.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html