2008-06-30 23:47:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFT] x86 acpi: normalize segment descriptor register on resume

Hi,

The appended patch fixes a regression and is considered as 2.6.26 material.
Everyone having a box with working suspend to RAM is gently requested to test
it and verify if it doesn't break things.

The patch applies to the current -git.

Thanks,
Rafael

---
From: H. Peter Anvin <[email protected]>

x86 acpi: normalize segment descriptor register on resume

Some Dell laptops enter resume with apparent garbage in the segment
descriptor registers (almost certainly the result of a botched
transition from protected to real mode.) The only way to clean that
up is to enter protected mode ourselves and clean out the descriptor
registers.

This fixes resume on Dell XPS M1210 and Dell D620.

Reference: http://bugzilla.kernel.org/show_bug.cgi?id=10927

Signed-off-by: H. Peter Anvin <[email protected]>
Tested-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/acpi/realmode/wakeup.S | 38 ++++++++++++++++++++++++++++++++-
arch/x86/kernel/acpi/realmode/wakeup.h | 5 ++++
arch/x86/kernel/acpi/sleep.c | 16 +++++++++++++
drivers/acpi/sleep/main.c | 5 +---
4 files changed, 59 insertions(+), 5 deletions(-)

Index: linux-2.6/arch/x86/kernel/acpi/realmode/wakeup.S
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/realmode/wakeup.S
+++ linux-2.6/arch/x86/kernel/acpi/realmode/wakeup.S
@@ -5,6 +5,7 @@
#include <asm/msr-index.h>
#include <asm/page.h>
#include <asm/pgtable.h>
+#include <asm/processor-flags.h>

.code16
.section ".header", "a"
@@ -24,6 +25,11 @@ pmode_gdt: .quad 0
realmode_flags: .long 0
real_magic: .long 0
trampoline_segment: .word 0
+_pad1: .byte 0
+wakeup_jmp: .byte 0xea /* ljmpw */
+wakeup_jmp_off: .word 3f
+wakeup_jmp_seg: .word 0
+wakeup_gdt: .quad 0, 0, 0
signature: .long 0x51ee1111

.text
@@ -34,11 +40,34 @@ _start:
cli
cld

+ /* Apparently some dimwit BIOS programmers don't know how to
+ program a PM to RM transition, and we might end up here with
+ junk in the data segment descriptor registers. The only way
+ to repair that is to go into PM and fix it ourselves... */
+ movw $16, %cx
+ lgdtl %cs:wakeup_gdt
+ movl %cr0, %eax
+ orb $X86_CR0_PE, %al
+ movl %eax, %cr0
+ jmp 1f
+1: ljmpw $8, $2f
+2:
+ movw %cx, %ds
+ movw %cx, %es
+ movw %cx, %ss
+ movw %cx, %fs
+ movw %cx, %gs
+
+ andb $~X86_CR0_PE, %al
+ movl %eax, %cr0
+ jmp wakeup_jmp
+3:
/* Set up segments */
movw %cs, %ax
movw %ax, %ds
movw %ax, %es
movw %ax, %ss
+ lidtl wakeup_idt

movl $wakeup_stack_end, %esp

@@ -98,7 +127,14 @@ bogus_real_magic:
jmp 1b

.data
- .balign 4
+ .balign 8
+
+ /* This is the standard real-mode IDT */
+wakeup_idt:
+ .word 0xffff /* limit */
+ .long 0 /* address */
+ .word 0
+
.globl HEAP, heap_end
HEAP:
.long wakeup_heap
Index: linux-2.6/arch/x86/kernel/acpi/realmode/wakeup.h
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/realmode/wakeup.h
+++ linux-2.6/arch/x86/kernel/acpi/realmode/wakeup.h
@@ -24,6 +24,11 @@ struct wakeup_header {
u32 realmode_flags;
u32 real_magic;
u16 trampoline_segment; /* segment with trampoline code, 64-bit only */
+ u8 _pad1;
+ u8 wakeup_jmp;
+ u16 wakeup_jmp_off;
+ u16 wakeup_jmp_seg;
+ u64 wakeup_gdt[3];
u32 signature; /* To check we have correct structure */
} __attribute__((__packed__));

Index: linux-2.6/arch/x86/kernel/acpi/sleep.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/acpi/sleep.c
+++ linux-2.6/arch/x86/kernel/acpi/sleep.c
@@ -50,6 +50,20 @@ int acpi_save_state_mem(void)

header->video_mode = saved_video_mode;

+ header->wakeup_jmp_seg = acpi_wakeup_address >> 4;
+ /* GDT[0]: GDT self-pointer */
+ header->wakeup_gdt[0] =
+ (u64)(sizeof(header->wakeup_gdt) - 1) +
+ ((u64)(acpi_wakeup_address +
+ ((char *)&header->wakeup_gdt - (char *)acpi_realmode))
+ << 16);
+ /* GDT[1]: real-mode-like code segment */
+ header->wakeup_gdt[1] = (0x009bULL << 40) +
+ ((u64)acpi_wakeup_address << 16) + 0xffff;
+ /* GDT[2]: real-mode-like data segment */
+ header->wakeup_gdt[2] = (0x0093ULL << 40) +
+ ((u64)acpi_wakeup_address << 16) + 0xffff;
+
#ifndef CONFIG_64BIT
store_gdt((struct desc_ptr *)&header->pmode_gdt);

@@ -111,7 +125,7 @@ void __init acpi_reserve_bootmem(void)
return;
}

- acpi_wakeup_address = acpi_realmode;
+ acpi_wakeup_address = virt_to_phys((void *)acpi_realmode);
}


Index: linux-2.6/drivers/acpi/sleep/main.c
===================================================================
--- linux-2.6.orig/drivers/acpi/sleep/main.c
+++ linux-2.6/drivers/acpi/sleep/main.c
@@ -32,9 +32,8 @@ static int acpi_sleep_prepare(u32 acpi_s
if (!acpi_wakeup_address) {
return -EFAULT;
}
- acpi_set_firmware_waking_vector((acpi_physical_address)
- virt_to_phys((void *)
- acpi_wakeup_address));
+ acpi_set_firmware_waking_vector(
+ (acpi_physical_address)acpi_wakeup_address);

}
ACPI_FLUSH_CPU_CACHE();


2008-07-01 00:06:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> Hi,
>
> The appended patch fixes a regression and is considered as 2.6.26 material.
> Everyone having a box with working suspend to RAM is gently requested to test
> it and verify if it doesn't break things.
>
> The patch applies to the current -git.
>

Given the Elan report from earlier today, I'm worried this patch might
have the same problem. If so, it's a relatively easy change; I'll try
to have an updated patch later this evening.

-hpa

2008-07-01 06:32:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


* Rafael J. Wysocki <[email protected]> wrote:

> Hi,
>
> The appended patch fixes a regression and is considered as 2.6.26
> material. Everyone having a box with working suspend to RAM is gently
> requested to test it and verify if it doesn't break things.
>
> The patch applies to the current -git.

it's been under testing in tip/out-of-tree for about a week:

| commit ee901dc1b9ab94a37ba2efc296fe9ba72bc75adf
| Author: H. Peter Anvin <[email protected]>
| AuthorDate: Tue Jun 24 23:03:48 2008 +0200
| Commit: Ingo Molnar <[email protected]>
| CommitDate: Wed Jun 25 18:48:13 2008 +0200
|
| x86 ACPI: normalize segment descriptor register on resume

no problems caused by it so far.

Ingo

2008-07-01 06:59:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Ingo Molnar wrote:
> * Rafael J. Wysocki <[email protected]> wrote:
>
>> Hi,
>>
>> The appended patch fixes a regression and is considered as 2.6.26
>> material. Everyone having a box with working suspend to RAM is gently
>> requested to test it and verify if it doesn't break things.
>>
>> The patch applies to the current -git.
>
> it's been under testing in tip/out-of-tree for about a week:
>
> | commit ee901dc1b9ab94a37ba2efc296fe9ba72bc75adf
> | Author: H. Peter Anvin <[email protected]>
> | AuthorDate: Tue Jun 24 23:03:48 2008 +0200
> | Commit: Ingo Molnar <[email protected]>
> | CommitDate: Wed Jun 25 18:48:13 2008 +0200
> |
> | x86 ACPI: normalize segment descriptor register on resume
>
> no problems caused by it so far.
>

Here is the incremental patch which should stick "strictly to the
script" of ljmp immediately after writing CR0.PE. This should be done
to the boot code as well; I'm waiting for confirmation from the Elan
original reporter before submitting that patch.

I decided to make this an incremental patch to make it bisectable versus
the other one, however, it should probably be considered the right thing.

Note: I have not tested this beyond compilation, I'm afraid.

-hpa


Attachments:
0002-x86-acpi-on-wakeup-ljmp-directly-after-writing-CR0.patch (3.45 kB)

2008-07-01 07:07:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Ingo Molnar wrote:
> * Rafael J. Wysocki <[email protected]> wrote:
>
>> Hi,
>>
>> The appended patch fixes a regression and is considered as 2.6.26
>> material. Everyone having a box with working suspend to RAM is gently
>> requested to test it and verify if it doesn't break things.
>>
>> The patch applies to the current -git.
>
> it's been under testing in tip/out-of-tree for about a week:
>
> | commit ee901dc1b9ab94a37ba2efc296fe9ba72bc75adf
> | Author: H. Peter Anvin <[email protected]>
> | AuthorDate: Tue Jun 24 23:03:48 2008 +0200
> | Commit: Ingo Molnar <[email protected]>
> | CommitDate: Wed Jun 25 18:48:13 2008 +0200
> |
> | x86 ACPI: normalize segment descriptor register on resume
>
> no problems caused by it so far.
>

It still seems incredibly risky to push this for 2.6.26, especially
given the Elan revelation. I think it needs to be tested on the 2.6.27
track, and then possibly be pushed back via the 2.6.26-stable route.

-hpa

2008-07-01 07:20:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


* Rafael J. Wysocki <[email protected]> wrote:

> Hi,
>
> The appended patch fixes a regression and is considered as 2.6.26
> material. Everyone having a box with working suspend to RAM is gently
> requested to test it and verify if it doesn't break things.
>
> The patch applies to the current -git.

The fix is _really_ tempting, but i think it's 2.6.26.1 material at the
earliest. I just counted about 8 red flag items in that commit:

- "assembly code"
- "fresh change"
- "suspend/resume"
- "real-mode code"
- "ACPI"
- "SMM"
- "CPU erratas"
- "boot code"

I'd say it's probably 90% fine, but it's just too much risk at this
stage i think. The regression was only found 2 weeks ago, and the commit
that broke it was upstream for 2 months (and was under testing for about
4 months).

[ We have to try to shorten the test cycle for such problems. Hopefully
in v2.6.27 we'll have CONFIG_PM_TEST_SUSPEND=y :-) ]

Ingo

2008-07-01 07:30:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Tue, 1 Jul 2008 09:20:24 +0200 Ingo Molnar <[email protected]> wrote:

> * Rafael J. Wysocki <[email protected]> wrote:
>
> > Hi,
> >
> > The appended patch fixes a regression and is considered as 2.6.26
> > material. Everyone having a box with working suspend to RAM is gently
> > requested to test it and verify if it doesn't break things.
> >
> > The patch applies to the current -git.
>
> The fix is _really_ tempting, but i think it's 2.6.26.1 material at the
> earliest. I just counted about 8 red flag items in that commit:
>
> - "assembly code"
> - "fresh change"
> - "suspend/resume"
> - "real-mode code"
> - "ACPI"
> - "SMM"
> - "CPU erratas"
> - "boot code"
>
> I'd say it's probably 90% fine, but it's just too much risk at this
> stage i think. The regression was only found 2 weeks ago, and the commit
> that broke it was upstream for 2 months (and was under testing for about
> 4 months).

Merge it into 2.6.27-rc1 and add Cc: <[email protected]> to the changelog with
a note "needed in 2.6.26.x after a couple of weeks testing in mainline" or
something like that.

I expect 2.6.25.x will be maintained for a while yet too...

2008-07-01 07:46:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


* Andrew Morton <[email protected]> wrote:

> On Tue, 1 Jul 2008 09:20:24 +0200 Ingo Molnar <[email protected]> wrote:
>
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > The appended patch fixes a regression and is considered as 2.6.26
> > > material. Everyone having a box with working suspend to RAM is gently
> > > requested to test it and verify if it doesn't break things.
> > >
> > > The patch applies to the current -git.
> >
> > The fix is _really_ tempting, but i think it's 2.6.26.1 material at the
> > earliest. I just counted about 8 red flag items in that commit:
> >
> > - "assembly code"
> > - "fresh change"
> > - "suspend/resume"
> > - "real-mode code"
> > - "ACPI"
> > - "SMM"
> > - "CPU erratas"
> > - "boot code"
> >
> > I'd say it's probably 90% fine, but it's just too much risk at this
> > stage i think. The regression was only found 2 weeks ago, and the commit
> > that broke it was upstream for 2 months (and was under testing for about
> > 4 months).
>
> Merge it into 2.6.27-rc1 and add Cc: <[email protected]> to the changelog with
> a note "needed in 2.6.26.x after a couple of weeks testing in mainline" or
> something like that.

ok.

> I expect 2.6.25.x will be maintained for a while yet too...

2.6.25.x is not affected by the suspend+resume aspect of this problem.

Ingo

2008-07-01 09:18:47

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Hi!

>>> The appended patch fixes a regression and is considered as 2.6.26
>>> material. Everyone having a box with working suspend to RAM is gently
>>> requested to test it and verify if it doesn't break things.
>>>
>>> The patch applies to the current -git.
>>
>> it's been under testing in tip/out-of-tree for about a week:
>>
>> | commit ee901dc1b9ab94a37ba2efc296fe9ba72bc75adf
>> | Author: H. Peter Anvin <[email protected]>
>> | AuthorDate: Tue Jun 24 23:03:48 2008 +0200
>> | Commit: Ingo Molnar <[email protected]>
>> | CommitDate: Wed Jun 25 18:48:13 2008 +0200
>> |
>> | x86 ACPI: normalize segment descriptor register on resume
>>
>> no problems caused by it so far.
>>
>
> Here is the incremental patch which should stick "strictly to the script"
> of ljmp immediately after writing CR0.PE. This should be done to the boot
> code as well; I'm waiting for confirmation from the Elan original reporter
> before submitting that patch.
>
> I decided to make this an incremental patch to make it bisectable versus
> the other one, however, it should probably be considered the right thing.
>
> Note: I have not tested this beyond compilation, I'm afraid.

I had to apply it by hand, then I tested it along with Rafael's patch
and it does not seem to break anything here. So feel free to add my
ACK to both.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-07-01 13:03:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


>
> It still seems incredibly risky to push this for 2.6.26, especially
> given the Elan revelation.

Do Elans even support S3?

> I think it needs to be tested on the 2.6.27
> track, and then possibly be pushed back via the 2.6.26-stable route.

I'm just not sure how many suspend/resume cycles people really do
on a early (pre -rc) mainline kernel (or in linux-next for that
matter). You usually have to install on a laptop and actually
use it.

Since this code is only executed on resume some directed testing
would be better. That is what Rafael asked for in this mail.

I think it would be ok for .26 if we can get confirmation it works
on a few systems with S3 suspend/resume.

-Andi

2008-07-01 15:57:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andi Kleen wrote:
>> It still seems incredibly risky to push this for 2.6.26, especially
>> given the Elan revelation.
>
> Do Elans even support S3?

I don't know if they do, but I don't know offhand the extent of machines
that may have that problem, especially since Intel now document it as
"failures are readily seen".

>> I think it needs to be tested on the 2.6.27
>> track, and then possibly be pushed back via the 2.6.26-stable route.
>
> I'm just not sure how many suspend/resume cycles people really do
> on a early (pre -rc) mainline kernel (or in linux-next for that
> matter). You usually have to install on a laptop and actually
> use it.
>
> Since this code is only executed on resume some directed testing
> would be better. That is what Rafael asked for in this mail.

The issue is mostly if it breaks some obscure system. I have put it on
my laptop, Ingo has it on this test system with a suspend-testing cycle,
and so on, but the number of systems exposed is going to be small.

> I think it would be ok for .26 if we can get confirmation it works
> on a few systems with S3 suspend/resume.

That we already know it does, but it took a long time even until the
regression came to light.

-hpa

2008-07-01 16:21:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

H. Peter Anvin wrote:
> Andi Kleen wrote:
>>> It still seems incredibly risky to push this for 2.6.26, especially
>>> given the Elan revelation.
>>
>> Do Elans even support S3?
>
> I don't know if they do, but I don't know offhand the extent of machines
> that may have that problem, especially since Intel now document it as
> "failures are readily seen".

What document is that exactly?


>
>>> I think it needs to be tested on the 2.6.27
>>> track, and then possibly be pushed back via the 2.6.26-stable route.
>>
>> I'm just not sure how many suspend/resume cycles people really do
>> on a early (pre -rc) mainline kernel (or in linux-next for that
>> matter). You usually have to install on a laptop and actually
>> use it.
>>
>> Since this code is only executed on resume some directed testing
>> would be better. That is what Rafael asked for in this mail.
>
> The issue is mostly if it breaks some obscure system. I have put it on
> my laptop, Ingo has it on this test system with a suspend-testing cycle,
> and so on, but the number of systems exposed is going to be small.

Right now it looks like a significant number of Dell laptops
are affected by this regression. That's a serious issue.

-Andi

2008-07-01 17:29:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andi Kleen wrote:
> H. Peter Anvin wrote:
>> Andi Kleen wrote:
>>>> It still seems incredibly risky to push this for 2.6.26, especially
>>>> given the Elan revelation.
>>> Do Elans even support S3?
>> I don't know if they do, but I don't know offhand the extent of machines
>> that may have that problem, especially since Intel now document it as
>> "failures are readily seen".
>
> What document is that exactly?
>

Per Jeremy's recent email, volume 3a, section 8.9.1:

> Random failures can occur if other instructions exist between steps
> 3 and 4 above. Failures will be readily seen in some situations,
> such as when instructions that reference memory are inserted between
> steps 3 and 4 while in system management mode.

-hpa

2008-07-01 20:37:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Tuesday, 1 of July 2008, H. Peter Anvin wrote:
> Ingo Molnar wrote:
> > * Rafael J. Wysocki <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> The appended patch fixes a regression and is considered as 2.6.26
> >> material. Everyone having a box with working suspend to RAM is gently
> >> requested to test it and verify if it doesn't break things.
> >>
> >> The patch applies to the current -git.
> >
> > it's been under testing in tip/out-of-tree for about a week:
> >
> > | commit ee901dc1b9ab94a37ba2efc296fe9ba72bc75adf
> > | Author: H. Peter Anvin <[email protected]>
> > | AuthorDate: Tue Jun 24 23:03:48 2008 +0200
> > | Commit: Ingo Molnar <[email protected]>
> > | CommitDate: Wed Jun 25 18:48:13 2008 +0200
> > |
> > | x86 ACPI: normalize segment descriptor register on resume
> >
> > no problems caused by it so far.
> >
>
> Here is the incremental patch which should stick "strictly to the
> script" of ljmp immediately after writing CR0.PE. This should be done
> to the boot code as well; I'm waiting for confirmation from the Elan
> original reporter before submitting that patch.
>
> I decided to make this an incremental patch to make it bisectable versus
> the other one, however, it should probably be considered the right thing.
>
> Note: I have not tested this beyond compilation, I'm afraid.

I have tested it on the nx6325 (64-bit) on top of the today's linux-next. It
works fine for me.

Thanks,
Rafael

2008-07-01 20:42:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


> I have tested it on the nx6325 (64-bit) on top of the today's linux-next. It
> works fine for me.

Could a few more people with different laptops please test too?

-Andi

2008-07-01 20:49:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Tuesday, 1 of July 2008, Andi Kleen wrote:
>
> > I have tested it on the nx6325 (64-bit) on top of the today's linux-next. It
> > works fine for me.
>
> Could a few more people with different laptops please test too?

Well, I think we can put it into the 2.6.27 queue in the hope it gets more
testing in linux-next over time.

Thanks,
Rafael

2008-07-01 20:52:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> On Tuesday, 1 of July 2008, Andi Kleen wrote:
>>> I have tested it on the nx6325 (64-bit) on top of the today's linux-next. It
>>> works fine for me.
>> Could a few more people with different laptops please test too?
>
> Well, I think we can put it into the 2.6.27 queue in the hope it gets more
> testing in linux-next over time.

I'm just not sure how many people really put linux-next on their laptops.
Better would be some directed testing.

We'll see if that kernel-testers list actually does something or is just a placebo.

-Andi

2008-07-12 06:36:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

My Lenovo X61s fails to resume if I suspend it from within X, on both
2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
wireless-testing with 4b4f7280 reverted. My in-progress bisect between
-rc8 and -rc9 is also consistent with this being the problem.

The symptom is that, when I push the power button to resume, the hard
drive light turns on, the fan turns on, then the hard drive light turns
off, the sleep light stays on, and the fan keeps running. Sometimes the
battery light will blink off very briefly (1/4 sec, maybe) every few
seconds. The system is locked hard at this point.

I'm using Ubuntu Hardy userspace.

--Andy

Andi Kleen wrote:
>> I have tested it on the nx6325 (64-bit) on top of the today's linux-next. It
>> works fine for me.
>
> Could a few more people with different laptops please test too?
>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2008-07-12 12:09:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andy Lutomirski wrote:
> My Lenovo X61s fails to resume if I suspend it from within X, on both
> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> -rc8 and -rc9 is also consistent with this being the problem.

With "this" do you mean the patch being applied or not applied?

As in does it fail with the patch applied?

You can confirm that by only applying the patch and testing.

-Andi

2008-07-12 15:08:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andi Kleen wrote:
> Andy Lutomirski wrote:
>> My Lenovo X61s fails to resume if I suspend it from within X, on both
>> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
>> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
>> -rc8 and -rc9 is also consistent with this being the problem.
>
> With "this" do you mean the patch being applied or not applied?
>
> As in does it fail with the patch applied?
>
> You can confirm that by only applying the patch and testing.

The tree as of 4b4f2780 (the title patch of this thread) fails to
resume. Its parent works fine. Which makes this patch the culprit. (I
just tried both.)

--Andy

2008-07-12 18:49:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> My Lenovo X61s fails to resume if I suspend it from within X, on both
> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> -rc8 and -rc9 is also consistent with this being the problem.
>
> The symptom is that, when I push the power button to resume, the hard
> drive light turns on, the fan turns on, then the hard drive light turns
> off, the sleep light stays on, and the fan keeps running. Sometimes the
> battery light will blink off very briefly (1/4 sec, maybe) every few
> seconds. The system is locked hard at this point.
>
> I'm using Ubuntu Hardy userspace.

Well, that's bad.

There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
for this bug and you've just confirmed my suspicion that this particular
commit is to blame.

Can you please see if the appended patch changes anything?

Thanks,
Rafael

---
From: H. Peter Anvin <[email protected]>
Date: Mon, 30 Jun 2008 23:48:35 -0700
Subject: [PATCH] x86 acpi: on wakeup, ljmp directly after writing CR0.PE

Impact: possible resume failures on AMD Elan, others?

Intel documents that writing cr0 should be immediately followed by a
ljmp, and that "failures are readily seen" if the processor enters SMM
at this point. We believe this has been observed on the AMD Elan, so
stick strictly to the script and do an ljmp immediately after a change
to CR0.PE in all circumstances.

Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/kernel/acpi/realmode/wakeup.S | 11 ++++-------
arch/x86/kernel/acpi/realmode/wakeup.h | 6 ++----
arch/x86/kernel/acpi/sleep.c | 4 +++-
3 files changed, 9 insertions(+), 12 deletions(-)

Index: linux-next/arch/x86/kernel/acpi/realmode/wakeup.S
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/realmode/wakeup.S
+++ linux-next/arch/x86/kernel/acpi/realmode/wakeup.S
@@ -25,10 +25,8 @@ pmode_gdt: .quad 0
realmode_flags: .long 0
real_magic: .long 0
trampoline_segment: .word 0
-_pad1: .byte 0
-wakeup_jmp: .byte 0xea /* ljmpw */
-wakeup_jmp_off: .word 3f
-wakeup_jmp_seg: .word 0
+wakeup_seg_ptr: .word 3f-2 /* the segment in the ljmpw */
+_pad: .long 0
wakeup_gdt: .quad 0, 0, 0
signature: .long 0x51ee1111

@@ -49,8 +47,7 @@ _start:
movl %cr0, %eax
orb $X86_CR0_PE, %al
movl %eax, %cr0
- jmp 1f
-1: ljmpw $8, $2f
+ ljmpw $8, $2f
2:
movw %cx, %ds
movw %cx, %es
@@ -60,7 +57,7 @@ _start:

andb $~X86_CR0_PE, %al
movl %eax, %cr0
- jmp wakeup_jmp
+ ljmpw $0, $3f
3:
/* Set up segments */
movw %cs, %ax
Index: linux-next/arch/x86/kernel/acpi/realmode/wakeup.h
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/realmode/wakeup.h
+++ linux-next/arch/x86/kernel/acpi/realmode/wakeup.h
@@ -24,10 +24,8 @@ struct wakeup_header {
u32 realmode_flags;
u32 real_magic;
u16 trampoline_segment; /* segment with trampoline code, 64-bit only */
- u8 _pad1;
- u8 wakeup_jmp;
- u16 wakeup_jmp_off;
- u16 wakeup_jmp_seg;
+ u16 wakeup_seg_ptr;
+ u32 _pad;
u64 wakeup_gdt[3];
u32 signature; /* To check we have correct structure */
} __attribute__((__packed__));
Index: linux-next/arch/x86/kernel/acpi/sleep.c
===================================================================
--- linux-next.orig/arch/x86/kernel/acpi/sleep.c
+++ linux-next/arch/x86/kernel/acpi/sleep.c
@@ -34,6 +34,7 @@ static char temp_stack[10240];
int acpi_save_state_mem(void)
{
struct wakeup_header *header;
+ u16 *wakeup_seg;

if (!acpi_realmode) {
printk(KERN_ERR "Could not allocate memory during boot, "
@@ -43,6 +44,7 @@ int acpi_save_state_mem(void)
memcpy((void *)acpi_realmode, &wakeup_code_start, WAKEUP_SIZE);

header = (struct wakeup_header *)(acpi_realmode + HEADER_OFFSET);
+ wakeup_seg = (u16 *)(acpi_realmode + header->wakeup_seg_ptr);
if (header->signature != 0x51ee1111) {
printk(KERN_ERR "wakeup header does not match\n");
return -EINVAL;
@@ -50,7 +52,7 @@ int acpi_save_state_mem(void)

header->video_mode = saved_video_mode;

- header->wakeup_jmp_seg = acpi_wakeup_address >> 4;
+ *wakeup_seg = acpi_wakeup_address >> 4;
/* GDT[0]: GDT self-pointer */
header->wakeup_gdt[0] =
(u64)(sizeof(header->wakeup_gdt) - 1) +

2008-07-12 20:31:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> On Saturday, 12 of July 2008, Andy Lutomirski wrote:
>> My Lenovo X61s fails to resume if I suspend it from within X, on both
>> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
>> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
>> -rc8 and -rc9 is also consistent with this being the problem.
>>
>> The symptom is that, when I push the power button to resume, the hard
>> drive light turns on, the fan turns on, then the hard drive light turns
>> off, the sleep light stays on, and the fan keeps running. Sometimes the
>> battery light will blink off very briefly (1/4 sec, maybe) every few
>> seconds. The system is locked hard at this point.
>>
>> I'm using Ubuntu Hardy userspace.
>
> Well, that's bad.
>
> There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> for this bug and you've just confirmed my suspicion that this particular
> commit is to blame.
>
> Can you please see if the appended patch changes anything?

I suspended and resumed OK. Then I rebooted and tried again, and it failed.

--Andy

>
> Thanks,
> Rafael
>
> ---
> From: H. Peter Anvin <[email protected]>
> Date: Mon, 30 Jun 2008 23:48:35 -0700
> Subject: [PATCH] x86 acpi: on wakeup, ljmp directly after writing CR0.PE
>
> Impact: possible resume failures on AMD Elan, others?
>
> Intel documents that writing cr0 should be immediately followed by a
> ljmp, and that "failures are readily seen" if the processor enters SMM
> at this point. We believe this has been observed on the AMD Elan, so
> stick strictly to the script and do an ljmp immediately after a change
> to CR0.PE in all circumstances.
>
> Signed-off-by: H. Peter Anvin <[email protected]>
> ---
> arch/x86/kernel/acpi/realmode/wakeup.S | 11 ++++-------
> arch/x86/kernel/acpi/realmode/wakeup.h | 6 ++----
> arch/x86/kernel/acpi/sleep.c | 4 +++-
> 3 files changed, 9 insertions(+), 12 deletions(-)
>
> Index: linux-next/arch/x86/kernel/acpi/realmode/wakeup.S
> ===================================================================
> --- linux-next.orig/arch/x86/kernel/acpi/realmode/wakeup.S
> +++ linux-next/arch/x86/kernel/acpi/realmode/wakeup.S
> @@ -25,10 +25,8 @@ pmode_gdt: .quad 0
> realmode_flags: .long 0
> real_magic: .long 0
> trampoline_segment: .word 0
> -_pad1: .byte 0
> -wakeup_jmp: .byte 0xea /* ljmpw */
> -wakeup_jmp_off: .word 3f
> -wakeup_jmp_seg: .word 0
> +wakeup_seg_ptr: .word 3f-2 /* the segment in the ljmpw */
> +_pad: .long 0
> wakeup_gdt: .quad 0, 0, 0
> signature: .long 0x51ee1111
>
> @@ -49,8 +47,7 @@ _start:
> movl %cr0, %eax
> orb $X86_CR0_PE, %al
> movl %eax, %cr0
> - jmp 1f
> -1: ljmpw $8, $2f
> + ljmpw $8, $2f
> 2:
> movw %cx, %ds
> movw %cx, %es
> @@ -60,7 +57,7 @@ _start:
>
> andb $~X86_CR0_PE, %al
> movl %eax, %cr0
> - jmp wakeup_jmp
> + ljmpw $0, $3f
> 3:
> /* Set up segments */
> movw %cs, %ax
> Index: linux-next/arch/x86/kernel/acpi/realmode/wakeup.h
> ===================================================================
> --- linux-next.orig/arch/x86/kernel/acpi/realmode/wakeup.h
> +++ linux-next/arch/x86/kernel/acpi/realmode/wakeup.h
> @@ -24,10 +24,8 @@ struct wakeup_header {
> u32 realmode_flags;
> u32 real_magic;
> u16 trampoline_segment; /* segment with trampoline code, 64-bit only */
> - u8 _pad1;
> - u8 wakeup_jmp;
> - u16 wakeup_jmp_off;
> - u16 wakeup_jmp_seg;
> + u16 wakeup_seg_ptr;
> + u32 _pad;
> u64 wakeup_gdt[3];
> u32 signature; /* To check we have correct structure */
> } __attribute__((__packed__));
> Index: linux-next/arch/x86/kernel/acpi/sleep.c
> ===================================================================
> --- linux-next.orig/arch/x86/kernel/acpi/sleep.c
> +++ linux-next/arch/x86/kernel/acpi/sleep.c
> @@ -34,6 +34,7 @@ static char temp_stack[10240];
> int acpi_save_state_mem(void)
> {
> struct wakeup_header *header;
> + u16 *wakeup_seg;
>
> if (!acpi_realmode) {
> printk(KERN_ERR "Could not allocate memory during boot, "
> @@ -43,6 +44,7 @@ int acpi_save_state_mem(void)
> memcpy((void *)acpi_realmode, &wakeup_code_start, WAKEUP_SIZE);
>
> header = (struct wakeup_header *)(acpi_realmode + HEADER_OFFSET);
> + wakeup_seg = (u16 *)(acpi_realmode + header->wakeup_seg_ptr);
> if (header->signature != 0x51ee1111) {
> printk(KERN_ERR "wakeup header does not match\n");
> return -EINVAL;
> @@ -50,7 +52,7 @@ int acpi_save_state_mem(void)
>
> header->video_mode = saved_video_mode;
>
> - header->wakeup_jmp_seg = acpi_wakeup_address >> 4;
> + *wakeup_seg = acpi_wakeup_address >> 4;
> /* GDT[0]: GDT self-pointer */
> header->wakeup_gdt[0] =
> (u64)(sizeof(header->wakeup_gdt) - 1) +

2008-07-12 20:39:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> On Saturday, 12 of July 2008, Andy Lutomirski wrote:
>> My Lenovo X61s fails to resume if I suspend it from within X, on both
>> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
>> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
>> -rc8 and -rc9 is also consistent with this being the problem.
>>
>> The symptom is that, when I push the power button to resume, the hard
>> drive light turns on, the fan turns on, then the hard drive light turns
>> off, the sleep light stays on, and the fan keeps running. Sometimes the
>> battery light will blink off very briefly (1/4 sec, maybe) every few
>> seconds. The system is locked hard at this point.
>>
>> I'm using Ubuntu Hardy userspace.
>
> Well, that's bad.
>
> There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> for this bug and you've just confirmed my suspicion that this particular
> commit is to blame.
>
> Can you please see if the appended patch changes anything?

More correctly:

If I suspend by typing pm-suspend or echo mem >/sys/power/state, then it
resumes just fine. If I log in to Gnome and push the suspend button,
then it does not resume. This seems to be the case with or without your
patch.

-rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
I suspend.

--Andy

2008-07-12 20:46:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> >> My Lenovo X61s fails to resume if I suspend it from within X, on both
> >> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> >> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> >> -rc8 and -rc9 is also consistent with this being the problem.
> >>
> >> The symptom is that, when I push the power button to resume, the hard
> >> drive light turns on, the fan turns on, then the hard drive light turns
> >> off, the sleep light stays on, and the fan keeps running. Sometimes the
> >> battery light will blink off very briefly (1/4 sec, maybe) every few
> >> seconds. The system is locked hard at this point.
> >>
> >> I'm using Ubuntu Hardy userspace.
> >
> > Well, that's bad.
> >
> > There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> > for this bug and you've just confirmed my suspicion that this particular
> > commit is to blame.
> >
> > Can you please see if the appended patch changes anything?
>
> I suspended and resumed OK. Then I rebooted and tried again, and it failed.

That's very strange.

Well, is suspend/resume 100% reliable without commit
4b4f7280d7fd1feeff134c2cf2db32fd583b6c29 ?

Also, is the system 64-bit or 32-bit and do you use any user-space quirks to
bring the graphics to life during resume?

And what does happen if you suspend it from the console (no X)?

Rafael

2008-07-12 20:51:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> >> My Lenovo X61s fails to resume if I suspend it from within X, on both
> >> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> >> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> >> -rc8 and -rc9 is also consistent with this being the problem.
> >>
> >> The symptom is that, when I push the power button to resume, the hard
> >> drive light turns on, the fan turns on, then the hard drive light turns
> >> off, the sleep light stays on, and the fan keeps running. Sometimes the
> >> battery light will blink off very briefly (1/4 sec, maybe) every few
> >> seconds. The system is locked hard at this point.
> >>
> >> I'm using Ubuntu Hardy userspace.
> >
> > Well, that's bad.
> >
> > There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> > for this bug and you've just confirmed my suspicion that this particular
> > commit is to blame.
> >
> > Can you please see if the appended patch changes anything?
>
> More correctly:
>
> If I suspend by typing pm-suspend or echo mem >/sys/power/state, then it
> resumes just fine. If I log in to Gnome and push the suspend button,
> then it does not resume. This seems to be the case with or without your
> patch.

Is there an Intel graphics in your box?

> -rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
> I suspend.

That's _really_ strange.

In fact I have only one explanation, which is that the Gnome suspend button
causes some user-space quirks to be applied, which are harmful and break the
resume. Also, without commit 4b4f7280 those quirks might have not been really
executed. Peter, does it sound reasonable?

Thanks,
Rafael

2008-07-12 23:11:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> On Saturday, 12 of July 2008, Andy Lutomirski wrote:
>> Rafael J. Wysocki wrote:
>>> On Saturday, 12 of July 2008, Andy Lutomirski wrote:
>>>> My Lenovo X61s fails to resume if I suspend it from within X, on both
>>>> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
>>>> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
>>>> -rc8 and -rc9 is also consistent with this being the problem.
>>>>
>>>> The symptom is that, when I push the power button to resume, the hard
>>>> drive light turns on, the fan turns on, then the hard drive light turns
>>>> off, the sleep light stays on, and the fan keeps running. Sometimes the
>>>> battery light will blink off very briefly (1/4 sec, maybe) every few
>>>> seconds. The system is locked hard at this point.
>>>>
>>>> I'm using Ubuntu Hardy userspace.
>>> Well, that's bad.
>>>
>>> There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
>>> for this bug and you've just confirmed my suspicion that this particular
>>> commit is to blame.
>>>
>>> Can you please see if the appended patch changes anything?
>> More correctly:
>>
>> If I suspend by typing pm-suspend or echo mem >/sys/power/state, then it
>> resumes just fine. If I log in to Gnome and push the suspend button,
>> then it does not resume. This seems to be the case with or without your
>> patch.
>
> Is there an Intel graphics in your box?

Yes.

>
>> -rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
>> I suspend.
>
> That's _really_ strange.
>
> In fact I have only one explanation, which is that the Gnome suspend button
> causes some user-space quirks to be applied, which are harmful and break the
> resume. Also, without commit 4b4f7280 those quirks might have not been really
> executed. Peter, does it sound reasonable?

Bingo. It's a HAL quirk.

Testing from the console (not X):

With 4b4f7280:
# echo mem >/sys/power/state -- works fine

# echo 3 >/proc/sys/kernel/acpi_video_flags
# echo mem >/sys/power state -- fails to resume

Without 4b4f7280:
# echo mem >/sys/power/state -- works fine

# echo 3 >/proc/sys/kernel/acpi_video_flags
# echo mem >/sys/power state -- works fine

So HAL contains an apparently unnecessary quirk for my laptop, and
4b4f7280 breaks that quirk. Of course, it's entirely possible that
4b4f7280 is 100% correct, but that the quirk only worked by accident and
4b4f7280 broke the call into video BIOS.

--Andy

2008-07-12 23:15:18

by H. Peter Anvin

[permalink] [raw]
Subject: RE: [RFT] x86 acpi: normalize segment descriptor register on resume

It seems like a reasonable guess at least. I won't have a chance to dig into the details until late this evening or tomorrow, depending on when I get home tonight.

--
Sent from my mobile phone (pardon any lack of formatting)


-----Original Message-----
From: Rafael J. Wysocki <[email protected]>
Sent: Saturday, July 12, 2008 13:53
To: Andy Lutomirski <[email protected]>; H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>; Ingo Molnar <[email protected]>; [email protected]; ACPI Devel Maling List <[email protected]>; LKML <[email protected]>; pm list <[email protected]>; Pavel Machek <[email protected]>
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> >> My Lenovo X61s fails to resume if I suspend it from within X, on both
> >> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> >> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> >> -rc8 and -rc9 is also consistent with this being the problem.
> >>
> >> The symptom is that, when I push the power button to resume, the hard
> >> drive light turns on, the fan turns on, then the hard drive light turns
> >> off, the sleep light stays on, and the fan keeps running. Sometimes the
> >> battery light will blink off very briefly (1/4 sec, maybe) every few
> >> seconds. The system is locked hard at this point.
> >>
> >> I'm using Ubuntu Hardy userspace.
> >
> > Well, that's bad.
> >
> > There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> > for this bug and you've just confirmed my suspicion that this particular
> > commit is to blame.
> >
> > Can you please see if the appended patch changes anything?
>
> More correctly:
>
> If I suspend by typing pm-suspend or echo mem >/sys/power/state, then it
> resumes just fine. If I log in to Gnome and push the suspend button,
> then it does not resume. This seems to be the case with or without your
> patch.

Is there an Intel graphics in your box?

> -rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
> I suspend.

That's _really_ strange.

In fact I have only one explanation, which is that the Gnome suspend button
causes some user-space quirks to be applied, which are harmful and break the
resume. Also, without commit 4b4f7280 those quirks might have not been really
executed. Peter, does it sound reasonable?

Thanks,
Rafael

2008-07-12 23:31:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sunday, 13 of July 2008, Andy Lutomirski wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> >> Rafael J. Wysocki wrote:
> >>> On Saturday, 12 of July 2008, Andy Lutomirski wrote:
> >>>> My Lenovo X61s fails to resume if I suspend it from within X, on both
> >>>> 2.6.26-rc9 and recent wireless-testing. 2.6.26-rc8 is fine, as is
> >>>> wireless-testing with 4b4f7280 reverted. My in-progress bisect between
> >>>> -rc8 and -rc9 is also consistent with this being the problem.
> >>>>
> >>>> The symptom is that, when I push the power button to resume, the hard
> >>>> drive light turns on, the fan turns on, then the hard drive light turns
> >>>> off, the sleep light stays on, and the fan keeps running. Sometimes the
> >>>> battery light will blink off very briefly (1/4 sec, maybe) every few
> >>>> seconds. The system is locked hard at this point.
> >>>>
> >>>> I'm using Ubuntu Hardy userspace.
> >>> Well, that's bad.
> >>>
> >>> There is the bugzilla entry at http://bugzilla.kernel.org/show_bug.cgi?id=11064
> >>> for this bug and you've just confirmed my suspicion that this particular
> >>> commit is to blame.
> >>>
> >>> Can you please see if the appended patch changes anything?
> >> More correctly:
> >>
> >> If I suspend by typing pm-suspend or echo mem >/sys/power/state, then it
> >> resumes just fine. If I log in to Gnome and push the suspend button,
> >> then it does not resume. This seems to be the case with or without your
> >> patch.
> >
> > Is there an Intel graphics in your box?
>
> Yes.
>
> >
> >> -rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
> >> I suspend.
> >
> > That's _really_ strange.
> >
> > In fact I have only one explanation, which is that the Gnome suspend button
> > causes some user-space quirks to be applied, which are harmful and break the
> > resume. Also, without commit 4b4f7280 those quirks might have not been really
> > executed. Peter, does it sound reasonable?
>
> Bingo. It's a HAL quirk.
>
> Testing from the console (not X):
>
> With 4b4f7280:
> # echo mem >/sys/power/state -- works fine
>
> # echo 3 >/proc/sys/kernel/acpi_video_flags
> # echo mem >/sys/power state -- fails to resume
>
> Without 4b4f7280:
> # echo mem >/sys/power/state -- works fine
>
> # echo 3 >/proc/sys/kernel/acpi_video_flags
> # echo mem >/sys/power state -- works fine
>
> So HAL contains an apparently unnecessary quirk for my laptop, and
> 4b4f7280 breaks that quirk. Of course, it's entirely possible that
> 4b4f7280 is 100% correct, but that the quirk only worked by accident and
> 4b4f7280 broke the call into video BIOS.

We've had reports from users of Intel graphics and the i915 driver that
previously working quirks started to break their systems with 2.6.26-rc, but
instead the plain "echo mem > /sys/power/state" started to work for them.
Your system may be one of these, but I wonder what the effect of commit
4b4f7280 is.

The first possibility is that the quirks actually didn't work on your system
with 2.6.26-rc before commit 4b4f7280 at all for some obscure reason and
that commit made them work again which in turn resulted in the breakage.

The second possibility is that commit 4b4f7280 actually broke those quirks.

I'm not sure if it's worth the effort to check which of the above really
happened. After all, you can suspend and resume the box without any quirks
now. ;-)

Thanks,
Rafael

2008-07-13 08:55:29

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Hi!

> > > Is there an Intel graphics in your box?
> >
> > Yes.
> >
> > >
> > >> -rc8 and -rc9 with the original patch 4b4f7280 resume fine no matter how
> > >> I suspend.
> > >
> > > That's _really_ strange.
> > >
> > > In fact I have only one explanation, which is that the Gnome suspend button
> > > causes some user-space quirks to be applied, which are harmful and break the
> > > resume. Also, without commit 4b4f7280 those quirks might have not been really
> > > executed. Peter, does it sound reasonable?
> >
> > Bingo. It's a HAL quirk.
> >
> > Testing from the console (not X):
> >
> > With 4b4f7280:
> > # echo mem >/sys/power/state -- works fine
> >
> > # echo 3 >/proc/sys/kernel/acpi_video_flags
> > # echo mem >/sys/power state -- fails to resume
> >
> > Without 4b4f7280:
> > # echo mem >/sys/power/state -- works fine
> >
> > # echo 3 >/proc/sys/kernel/acpi_video_flags
> > # echo mem >/sys/power state -- works fine
> >
> > So HAL contains an apparently unnecessary quirk for my laptop, and
> > 4b4f7280 breaks that quirk. Of course, it's entirely possible that
> > 4b4f7280 is 100% correct, but that the quirk only worked by accident and
> > 4b4f7280 broke the call into video BIOS.
>
> We've had reports from users of Intel graphics and the i915 driver that
> previously working quirks started to break their systems with 2.6.26-rc, but
> instead the plain "echo mem > /sys/power/state" started to work for them.
> Your system may be one of these, but I wonder what the effect of commit
> 4b4f7280 is.

In such case, you could verify by disabling intel framebuffer and just
using plain VGA driver (or vesafb).

Breaking video bios calls would be bad.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-07-13 09:16:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


* Rafael J. Wysocki <[email protected]> wrote:

> > Bingo. It's a HAL quirk.
> >
> > Testing from the console (not X):
> >
> > With 4b4f7280:
> > # echo mem >/sys/power/state -- works fine
> >
> > # echo 3 >/proc/sys/kernel/acpi_video_flags
> > # echo mem >/sys/power state -- fails to resume
> >
> > Without 4b4f7280:
> > # echo mem >/sys/power/state -- works fine
> >
> > # echo 3 >/proc/sys/kernel/acpi_video_flags
> > # echo mem >/sys/power state -- works fine
> >
> > So HAL contains an apparently unnecessary quirk for my laptop, and
> > 4b4f7280 breaks that quirk. Of course, it's entirely possible that
> > 4b4f7280 is 100% correct, but that the quirk only worked by accident and
> > 4b4f7280 broke the call into video BIOS.
>
> We've had reports from users of Intel graphics and the i915 driver
> that previously working quirks started to break their systems with
> 2.6.26-rc, but instead the plain "echo mem > /sys/power/state" started
> to work for them. Your system may be one of these, but I wonder what
> the effect of commit 4b4f7280 is.
>
> The first possibility is that the quirks actually didn't work on your
> system with 2.6.26-rc before commit 4b4f7280 at all for some obscure
> reason and that commit made them work again which in turn resulted in
> the breakage.
>
> The second possibility is that commit 4b4f7280 actually broke those
> quirks.
>
> I'm not sure if it's worth the effort to check which of the above
> really happened. After all, you can suspend and resume the box
> without any quirks now. ;-)

which is the ideal situation :-)

we still need to find the HAL quirk and disable it, right?

Ingo

2008-07-13 12:03:47

by Matthew Garrett

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:

> we still need to find the HAL quirk and disable it, right?

Not without understanding what the cause is. If the video BIOS calls are
generically broken, then we have a problem.

--
Matthew Garrett | [email protected]

2008-07-13 15:50:24

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Matthew Garrett wrote:
> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
>
>> we still need to find the HAL quirk and disable it, right?
>
> Not without understanding what the cause is. If the video BIOS calls are
> generically broken, then we have a problem.
>

The HAL quirk is the very first one here:

http://gitweb.freedesktop.org/?p=hal-info.git;a=blob;f=fdi/information/10freedesktop/20-video-quirk-pm-lenovo.fdi

I removed it from the HAL config and suspending w/ the hardware button
works fine now with -rc9-wl and on Ubuntu's stock .24 kernel.

I'll file the obvious bug report against HAL, but it might annoy users
if new kernel + old HAL = broken system, when the old kernel worked
fine. An alternative might be to make the change in 4b4f7280
conditional on a new acpi_video_flags quirk so that previously-working
systems keep working.

The quirk that actually breaks is s3_bios. s3_mode still works, and
power/video.txt says:

(2) systems where it is possible to call the video BIOS during S3
resume. Unfortunately, it is not correct to call the video BIOS at
that point, but it happens to work on some machines. Use
acpi_sleep=s3_bios.

--Andy

2008-07-13 18:01:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sunday, 13 of July 2008, Matthew Garrett wrote:
> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
>
> > we still need to find the HAL quirk and disable it, right?
>
> Not without understanding what the cause is. If the video BIOS calls are
> generically broken, then we have a problem.

To check that, we need a system which is confirmed to need acpi_sleep=s3_bios
to resume without commit 4b4f7280.

Unfortunately, I don't have any systems like this.

Thanks,
Rafael

2008-07-13 18:23:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Pavel Machek wrote:
>
> In such case, you could verify by disabling intel framebuffer and just
> using plain VGA driver (or vesafb).
>
> Breaking video bios calls would be bad.
> Pavel

It would... but it doesn't seem to make much sense. What, in detail,
does this particular quirk do?

-hpa

2008-07-13 18:43:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andy Lutomirski wrote:
> Matthew Garrett wrote:
>> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
>>
>>> we still need to find the HAL quirk and disable it, right?
>>
>> Not without understanding what the cause is. If the video BIOS calls
>> are generically broken, then we have a problem.
>>
>
> The HAL quirk is the very first one here:
>
> http://gitweb.freedesktop.org/?p=hal-info.git;a=blob;f=fdi/information/10freedesktop/20-video-quirk-pm-lenovo.fdi
>
>
> I removed it from the HAL config and suspending w/ the hardware button
> works fine now with -rc9-wl and on Ubuntu's stock .24 kernel.

Hmm, but the change was not supposed to break the s3 bios. Something
fishy is going on. It sounds like the s3 bios relies on some earlier
segment register setup.

If true this means the segment register reset would need to be moved
later after S3 bios ran. Saving/restoring is unfortunately not possible
because we cannot save/restore the hidden state loaded from the GDT earlier.

This is unfortunately a little tricky with the new C wakeup code.

> I'll file the obvious bug report against HAL, but it might annoy users
> if new kernel + old HAL = broken system,

It's the bad side effect of HAL effectively being an out of tree kernel
driver (that just by chance happens to run in user space). Really
all these s3 quirks at least should be in the kernel.

We can't really do much about that now, but longer term it might be useful
to invent some mechanism to tell HAL to disable specific quirks from
the kernel.

-Andi

2008-07-13 19:13:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sunday, 13 of July 2008, Andi Kleen wrote:
> Andy Lutomirski wrote:
> > Matthew Garrett wrote:
> >> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
> >>
> >>> we still need to find the HAL quirk and disable it, right?
> >>
> >> Not without understanding what the cause is. If the video BIOS calls
> >> are generically broken, then we have a problem.
> >>
> >
> > The HAL quirk is the very first one here:
> >
> > http://gitweb.freedesktop.org/?p=hal-info.git;a=blob;f=fdi/information/10freedesktop/20-video-quirk-pm-lenovo.fdi
> >
> >
> > I removed it from the HAL config and suspending w/ the hardware button
> > works fine now with -rc9-wl and on Ubuntu's stock .24 kernel.
>
> Hmm, but the change was not supposed to break the s3 bios. Something
> fishy is going on. It sounds like the s3 bios relies on some earlier
> segment register setup.

Well, we changed the (visible) parts of the segment registers before anyway.

This means that it could only depend on the hidden parts. However, in that
case if it depended on the hidden part of SS, our stack would be broken, so
the quirk wouldn't work (it uses 'call' to run a BIOS routine). In turn, if it
depended on the hidden part of DS, our data register would be broken, so the
resume code itself wouldn't work.

This means it could only depend on the hidden part of ES.

> If true this means the segment register reset would need to be moved
> later after S3 bios ran.

We can't do that. If SS contains garbage, the BIOS call itself will reboot
the box and if DS contains garbage, well ...

> Saving/restoring is unfortunately not possible because we cannot save/restore
> the hidden state loaded from the GDT earlier.
>
> This is unfortunately a little tricky with the new C wakeup code.
>
> > I'll file the obvious bug report against HAL, but it might annoy users
> > if new kernel + old HAL = broken system,
>
> It's the bad side effect of HAL effectively being an out of tree kernel
> driver (that just by chance happens to run in user space). Really
> all these s3 quirks at least should be in the kernel.
>
> We can't really do much about that now, but longer term it might be useful
> to invent some mechanism to tell HAL to disable specific quirks from
> the kernel.

They are in the kernel. In fact, there's a sysctl to switch them on/off and
that's what HAL uses, AFAICS.

Apparently, you can tell HAL not to do that by editing one of its files.

Thanks,
Rafael

2008-07-13 20:11:57

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
> On Sunday, 13 of July 2008, Andi Kleen wrote:
>> Andy Lutomirski wrote:
>>> Matthew Garrett wrote:
>>>> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
>>>>
>>>>> we still need to find the HAL quirk and disable it, right?
>>>> Not without understanding what the cause is. If the video BIOS calls
>>>> are generically broken, then we have a problem.
>>>>
>>> The HAL quirk is the very first one here:
>>>
>>> http://gitweb.freedesktop.org/?p=hal-info.git;a=blob;f=fdi/information/10freedesktop/20-video-quirk-pm-lenovo.fdi
>>>
>>>
>>> I removed it from the HAL config and suspending w/ the hardware button
>>> works fine now with -rc9-wl and on Ubuntu's stock .24 kernel.
>> Hmm, but the change was not supposed to break the s3 bios. Something
>> fishy is going on. It sounds like the s3 bios relies on some earlier
>> segment register setup.
>
> Well, we changed the (visible) parts of the segment registers before anyway.
>
> This means that it could only depend on the hidden parts. However, in that
> case if it depended on the hidden part of SS, our stack would be broken,

We're in real mode for now nd should not care about the hidden state.

> so
> the quirk wouldn't work (it uses 'call' to run a BIOS routine). In turn, if it
> depended on the hidden part of DS, our data register would be broken, so the
> resume code itself wouldn't work.
>
> This means it could only depend on the hidden part of ES.
>
>> If true this means the segment register reset would need to be moved
>> later after S3 bios ran.
>
> We can't do that. If SS contains garbage, the BIOS call itself will reboot
> the box and if DS contains garbage, well ...

But it's apparently not garbage for the s3 bios. So somehow the "garbage"
needs to be kept impact until the S3 BIOS ran.

>> Saving/restoring is unfortunately not possible because we cannot save/restore
>> the hidden state loaded from the GDT earlier.
>>
>> This is unfortunately a little tricky with the new C wakeup code.
>>
>>> I'll file the obvious bug report against HAL, but it might annoy users
>>> if new kernel + old HAL = broken system,
>> It's the bad side effect of HAL effectively being an out of tree kernel
>> driver (that just by chance happens to run in user space). Really
>> all these s3 quirks at least should be in the kernel.
>>
>> We can't really do much about that now, but longer term it might be useful
>> to invent some mechanism to tell HAL to disable specific quirks from
>> the kernel.
>
> They are in the kernel. In fact, there's a sysctl to switch them on/off and
> that's what HAL uses, AFAICS.

The quirk is clearly in user space in hal.

>
> Apparently, you can tell HAL not to do that by editing one of its files.


Sure, but the kernel can hardly edit someone's configuration files.

-Andi

2008-07-13 20:21:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

H. Peter Anvin wrote:
> Andi Kleen wrote:
>>
>> Hmm, but the change was not supposed to break the s3 bios. Something
>> fishy is going on. It sounds like the s3 bios relies on some earlier
>> segment register setup.
>>
>> If true this means the segment register reset would need to be moved
>> later after S3 bios ran. Saving/restoring is unfortunately not possible
>> because we cannot save/restore the hidden state loaded from the GDT
>> earlier.
>>
>
> That really doesn't make sense, though. The VESA BIOS has to be entered
> in clean real mode; it's designed to be entered from reset, after all.
> There is definitely something fishy going on, but I don't think this
> particular aspect is it.

It probably switches to protected mode. I noticed this on my old
Fujitsu laptop when I tried to make the S3 wakeup run in the s2ram x86 emulator
and found it entered protected mode at some point, which x86emu
didn't support.

I guess Lenovo is doing the same.

And that protected mode code relies on some GDT values that have been
loaded earlier when the BIOS also went into protected mode.

It seems the BIOS programmers really don't like real mode anymore.
Somehow understandable.

-Andi

2008-07-13 20:24:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Andi Kleen wrote:
>
> Hmm, but the change was not supposed to break the s3 bios. Something
> fishy is going on. It sounds like the s3 bios relies on some earlier
> segment register setup.
>
> If true this means the segment register reset would need to be moved
> later after S3 bios ran. Saving/restoring is unfortunately not possible
> because we cannot save/restore the hidden state loaded from the GDT earlier.
>

That really doesn't make sense, though. The VESA BIOS has to be entered
in clean real mode; it's designed to be entered from reset, after all.
There is definitely something fishy going on, but I don't think this
particular aspect is it.

What's *really* odd is that this was required with the old code but
doesn't work at all with the new code. The former is understandable,
the latter is not.

-hpa

2008-07-13 20:27:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sunday, 13 of July 2008, Andi Kleen wrote:
> Rafael J. Wysocki wrote:
> > On Sunday, 13 of July 2008, Andi Kleen wrote:
> >> Andy Lutomirski wrote:
> >>> Matthew Garrett wrote:
> >>>> On Sun, Jul 13, 2008 at 11:15:24AM +0200, Ingo Molnar wrote:
> >>>>
> >>>>> we still need to find the HAL quirk and disable it, right?
> >>>> Not without understanding what the cause is. If the video BIOS calls
> >>>> are generically broken, then we have a problem.
> >>>>
> >>> The HAL quirk is the very first one here:
> >>>
> >>> http://gitweb.freedesktop.org/?p=hal-info.git;a=blob;f=fdi/information/10freedesktop/20-video-quirk-pm-lenovo.fdi
> >>>
> >>>
> >>> I removed it from the HAL config and suspending w/ the hardware button
> >>> works fine now with -rc9-wl and on Ubuntu's stock .24 kernel.
> >> Hmm, but the change was not supposed to break the s3 bios. Something
> >> fishy is going on. It sounds like the s3 bios relies on some earlier
> >> segment register setup.
> >
> > Well, we changed the (visible) parts of the segment registers before anyway.
> >
> > This means that it could only depend on the hidden parts. However, in that
> > case if it depended on the hidden part of SS, our stack would be broken,
>
> We're in real mode for now nd should not care about the hidden state.

That's the point of commit 4b4f7280.

Without that commit we used to reset segment registers, but there were
some post-protected mode garbage in the hidden part of SS on some systems, so
things broke.

We now make sure the hidden parts of all segment registers are reset.
If anything fails as a result of that, it's broken beyond repair IMO.

> > so the quirk wouldn't work (it uses 'call' to run a BIOS routine). In turn, if it
> > depended on the hidden part of DS, our data register would be broken, so the
> > resume code itself wouldn't work.
> >
> > This means it could only depend on the hidden part of ES.
> >
> >> If true this means the segment register reset would need to be moved
> >> later after S3 bios ran.
> >
> > We can't do that. If SS contains garbage, the BIOS call itself will reboot
> > the box and if DS contains garbage, well ...
>
> But it's apparently not garbage for the s3 bios. So somehow the "garbage"
> needs to be kept impact until the S3 BIOS ran.

We can't do that on all systems, because some (I'd say many) of them don't
work if we do.

If s3_bios depends on the (invalid) contents of the hidden part of any segment
register, it's just terminally broken.

But.

We didn't modify %cx before - may that matter?

Thanks,
Rafael

2008-07-13 20:30:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sunday, 13 of July 2008, Andi Kleen wrote:
> H. Peter Anvin wrote:
> > Andi Kleen wrote:
> >>
> >> Hmm, but the change was not supposed to break the s3 bios. Something
> >> fishy is going on. It sounds like the s3 bios relies on some earlier
> >> segment register setup.
> >>
> >> If true this means the segment register reset would need to be moved
> >> later after S3 bios ran. Saving/restoring is unfortunately not possible
> >> because we cannot save/restore the hidden state loaded from the GDT
> >> earlier.
> >>
> >
> > That really doesn't make sense, though. The VESA BIOS has to be entered
> > in clean real mode; it's designed to be entered from reset, after all.
> > There is definitely something fishy going on, but I don't think this
> > particular aspect is it.
>
> It probably switches to protected mode. I noticed this on my old
> Fujitsu laptop when I tried to make the S3 wakeup run in the s2ram x86 emulator
> and found it entered protected mode at some point, which x86emu
> didn't support.
>
> I guess Lenovo is doing the same.
>
> And that protected mode code relies on some GDT values that have been
> loaded earlier when the BIOS also went into protected mode.
>
> It seems the BIOS programmers really don't like real mode anymore.
> Somehow understandable.

So should we check if we are in real mode at the entry point?

That would compilcate things a lot.

Thanks,
Rafael

2008-07-13 20:45:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 36af01f..97648aa 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -23,6 +23,15 @@ static unsigned long acpi_realmode;
static char temp_stack[10240];
#endif

+/* XXX: this macro should move to asm-x86/segment.h and be shared with the
+ boot code... */
+#define GDT_ENTRY(flags, base, limit) \
+ (((u64)(base & 0xff000000) << 32) | \
+ ((u64)flags << 40) | \
+ ((u64)(limit & 0x00ff0000) << 32) | \
+ ((u64)(base & 0x00ffffff) << 16) | \
+ ((u64)(limit & 0x0000ffff)))
+
/**
* acpi_save_state_mem - save kernel state
*
@@ -58,11 +67,11 @@ int acpi_save_state_mem(void)
((char *)&header->wakeup_gdt - (char *)acpi_realmode))
<< 16);
/* GDT[1]: real-mode-like code segment */
- header->wakeup_gdt[1] = (0x009bULL << 40) +
- ((u64)acpi_wakeup_address << 16) + 0xffff;
+ header->wakeup_gdt[1] =
+ GDT_ENTRY(0x809b, acpi_wakeup_address << 16, 0xfffff);
/* GDT[2]: real-mode-like data segment */
- header->wakeup_gdt[2] = (0x0093ULL << 40) +
- ((u64)acpi_wakeup_address << 16) + 0xffff;
+ header->wakeup_gdt[2] =
+ GDT_ENTRY(0x8093, acpi_wakeup_address << 16, 0xfffff);

#ifndef CONFIG_64BIT
store_gdt((struct desc_ptr *)&header->pmode_gdt);


Attachments:
diff (1.25 kB)
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun, 13 Jul 2008, Andi Kleen wrote:
> It probably switches to protected mode. I noticed this on my old
> Fujitsu laptop when I tried to make the S3 wakeup run in the s2ram x86 emulator
> and found it entered protected mode at some point, which x86emu
> didn't support.
>
> I guess Lenovo is doing the same.
>
> And that protected mode code relies on some GDT values that have been
> loaded earlier when the BIOS also went into protected mode.
>
> It seems the BIOS programmers really don't like real mode anymore.
> Somehow understandable.

If it were just Lenovo, it shouldn't be too difficult to get them to fix the
BIOS. But since other vendors have the same problem, that won't fly as a
solution.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2008-07-14 01:31:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun, Jul 13, 2008 at 01:38:27PM -0700, H. Peter Anvin wrote:

> The most likely explanation for this is that the VESA BIOS expects to be
> entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real
> Mode. Here is a completely untested patch which changes the segment
> descriptors to Big Real Mode instead. It would be worth testing out.

Recent Lenovo BIOSes will read from somewhere just under the 4GB
boundary if the video BIOS is called, so this sounds likely.
Unfortunately my T61 is, uh, unhappy right now, so I can't test this.

--
Matthew Garrett | [email protected]

2008-07-14 02:43:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Rafael J. Wysocki wrote:
>
> If s3_bios depends on the (invalid) contents of the hidden part of any segment
> register, it's just terminally broken.
>

Depending on it being in Big Real Mode isn't all that broken, however,
especially on laptops where the vendor has control over the system
BIOS/video BIOS.

Enabling BRM by default *shouldn't* break anything, so it might be what
we want to do anyway.

-hpa

2008-07-14 04:25:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

H. Peter Anvin wrote:
>
> The most likely explanation for this is that the VESA BIOS expects to be
> entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real
> Mode. Here is a completely untested patch which changes the segment
> descriptors to Big Real Mode instead. It would be worth testing out.
>

... and here is one that actually has a prayer of actually working.

-hpa


Attachments:
linux-2.6.26-bigrm-wakeup.patch (1.24 kB)

2008-07-14 06:14:03

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun 2008-07-13 22:22:06, Henrique de Moraes Holschuh wrote:
> On Sun, 13 Jul 2008, Andi Kleen wrote:
> > It probably switches to protected mode. I noticed this on my old
> > Fujitsu laptop when I tried to make the S3 wakeup run in the s2ram x86 emulator
> > and found it entered protected mode at some point, which x86emu
> > didn't support.
> >
> > I guess Lenovo is doing the same.
> >
> > And that protected mode code relies on some GDT values that have been
> > loaded earlier when the BIOS also went into protected mode.
> >
> > It seems the BIOS programmers really don't like real mode anymore.
> > Somehow understandable.
>
> If it were just Lenovo, it shouldn't be too difficult to get them to fix the
> BIOS. But since other vendors have the same problem, that won't fly as a
> solution.

With all the BIOSes in field, practically impossible to update? (My
x60 has no CDrom..) No, I don't think we can expect fix from BIOS vendors.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-07-14 06:36:19

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun 2008-07-13 11:16:38, H. Peter Anvin wrote:
> Pavel Machek wrote:
>>
>> In such case, you could verify by disabling intel framebuffer and just
>> using plain VGA driver (or vesafb).
>>
>> Breaking video bios calls would be bad.
>
> It would... but it doesn't seem to make much sense. What, in detail, does
> this particular quirk do?

lcall 0xc000, 3 into the BIOS :-(.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-07-14 07:38:38

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Sun 2008-07-13 21:18:02, H. Peter Anvin wrote:
> H. Peter Anvin wrote:
>>
>> The most likely explanation for this is that the VESA BIOS expects to be
>> entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real
>> Mode. Here is a completely untested patch which changes the segment
>> descriptors to Big Real Mode instead. It would be worth testing out.
>>
>
> ... and here is one that actually has a prayer of actually working.

I tried 2.6.26, but it does not seem to be broken here :-(. Thinkpad
x60. So it does not seem _all_ thinkpads have this problem...

Pavel


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Mon, 14 Jul 2008, Pavel Machek wrote:
> On Sun 2008-07-13 22:22:06, Henrique de Moraes Holschuh wrote:
> > On Sun, 13 Jul 2008, Andi Kleen wrote:
> > > It probably switches to protected mode. I noticed this on my old
> > > Fujitsu laptop when I tried to make the S3 wakeup run in the s2ram x86 emulator
> > > and found it entered protected mode at some point, which x86emu
> > > didn't support.
> > >
> > > I guess Lenovo is doing the same.
> > >
> > > And that protected mode code relies on some GDT values that have been
> > > loaded earlier when the BIOS also went into protected mode.
> > >
> > > It seems the BIOS programmers really don't like real mode anymore.
> > > Somehow understandable.
> >
> > If it were just Lenovo, it shouldn't be too difficult to get them to fix the
> > BIOS. But since other vendors have the same problem, that won't fly as a
> > solution.
>
> With all the BIOSes in field, practically impossible to update? (My
> x60 has no CDrom..) No, I don't think we can expect fix from BIOS vendors.

I explicitly said it wouldn't fly as a solution, because I doubt we could
expect a fix from EVERY BIOS vendor that has such problems. That sort of
account for machines that have not been updated, too.

But I do think "your BIOS is broken and there is an update available from
the vendor, apply it" *IS* a valid response if a workaround is not
desireable for some reason (e.g. it is too complex, or it would make the
kernel worse for a lot of others that did things right and don't have the
bug). I don't think we need to fear that happening on this particular
issue, but still...

As for your X60, you can use a USB mass storage device like a pendrive.
Someone already did all the hard work:
http://thinkwiki.org/wiki/BIOS_Upgrade/X_Series

In fact, as I rule, I cannot recommend using anything but the latest BIOS
and EC releases on a ThinkPad. That thing is too complex to live without
updates, just like the kernel (even if, just like the kernel, upgrading the
BIOS and EC can cause new bugs). Look at the changelog for your X60 BIOS
and EC firmware at
http://www-307.ibm.com/pc/support/site.wss/MIGR-65345.html. It has some
scary stuff fixed, and that's just what Lenovo decided to disclose, those
changelogs are NOT complete and often fail to list changes that are of
interest to Linux (such as fixes to the ACPI tables).

In fact, look at this interesting entry on the X60 BIOS changelog:
Version 7BETD3WW (2.14):
(Fix) Memory address more than 4GB can't be accessed under Vista
WindowsPE. (BIOS)

It is also listed for the X61 and the tablets on their BIOS updates. Maybe
that 4GB change could be related to the issue at hand?

It would be nice to get a report on the Big Realmote patch, and also know
the BIOS version of ThinkPads that have (and don't have) the bug.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2008-07-14 20:08:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Monday, 14 of July 2008, H. Peter Anvin wrote:
> H. Peter Anvin wrote:
> >
> > The most likely explanation for this is that the VESA BIOS expects to be
> > entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real
> > Mode. Here is a completely untested patch which changes the segment
> > descriptors to Big Real Mode instead. It would be worth testing out.
> >
>
> ... and here is one that actually has a prayer of actually working.

It works on my box and if that's the same as the one you attached to the
Bugzilla entry, it has also been reported to work on an X61s.

Thanks,
Rafael

2008-07-14 20:08:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

On Monday, 14 of July 2008, H. Peter Anvin wrote:
> Rafael J. Wysocki wrote:
> >
> > If s3_bios depends on the (invalid) contents of the hidden part of any segment
> > register, it's just terminally broken.
> >
>
> Depending on it being in Big Real Mode isn't all that broken, however,
> especially on laptops where the vendor has control over the system
> BIOS/video BIOS.
>
> Enabling BRM by default *shouldn't* break anything, so it might be what
> we want to do anyway.

Agreed.

2008-07-16 14:14:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume

Sorry for the delay in testing. This appears to work and fix the problem.

--Andy

On Mon, Jul 14, 2008 at 12:18 AM, H. Peter Anvin <[email protected]> wrote:
> H. Peter Anvin wrote:
>>
>> The most likely explanation for this is that the VESA BIOS expects to be
>> entered in Big Real Mode (*.limit = 0xffffffff) instead of ordinary Real
>> Mode. Here is a completely untested patch which changes the segment
>> descriptors to Big Real Mode instead. It would be worth testing out.
>>
>
> ... and here is one that actually has a prayer of actually working.
>
> -hpa
>
>
> --- AV & Spam Filtering by M+Guardian - Risk Free Email (TM) ---
>

2008-07-16 14:24:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFT] x86 acpi: normalize segment descriptor register on resume


* Andrew Lutomirski <[email protected]> wrote:

> Sorry for the delay in testing. This appears to work and fix the problem.

great, thanks!

Peter's fix is now upstream as:

| commit 3bf2e77453a87c22eb57ed4926760ac131c84459
| Author: H. Peter Anvin <[email protected]>
| Date: Sun Jul 13 21:18:02 2008 -0700
|
| x86, suspend, acpi: enter Big Real Mode

and has been requested to be added to v2.6.26.1 as well.

Ingo