2010-01-06 01:03:43

by Christian Kujau

[permalink] [raw]
Subject: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

Hi there,

a bit late with the testing again, I just found out that my Xen
DomU won't boot with 2.6.33-rc2. The last working one is 2.6.32, I'll try
to bisect if needed.

The booting stops at:

[ 0.010000] no hardware sampling interrupt available.
[ 0.010000] Intel PMU driver.
[ 0.010000] ... version: 2
[ 0.010000] ... bit width: 40
[ 0.010000] ... generic registers: 2
[ 0.010000] ... value mask: 000000ffffffffff
[ 0.010000] ... max period: 000000007fffffff
[ 0.010000] ... fixed-purpose events: 3
[ 0.010000] ... event mask: 0000000700000003
[ 0.011314] Freeing SMP alternatives: 24k freed


And "xm dmesg" says:

xen# xm dmesg
(XEN) traps.c:244:d88 Guest switching to user mode with no user page tables
(XEN) traps.c:273:d88 Fatal error
(XEN) domain_crash called from traps.c:274
(XEN) Domain 88 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.2.1-rc1-pre x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff810012eb>]
(XEN) RFLAGS: 0000000000240246 CONTEXT: guest
(XEN) rax: 0000000000000017 rbx: 0000000000000000 rcx: ffffffff810012eb
(XEN) rdx: 0000000000000000 rsi: ffffffff810488a0 rdi: 0000000000000000
(XEN) rbp: 0000000000000000 rsp: ffff88000f83df90 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000240246
(XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026b0
(XEN) cr3: 0000000096ba9000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffff88000f83df90:
(XEN) 0000000000000000 0000000000000202 ffffffff81009880 0000000000000100
(XEN) ffffffff81009880 0000000000000033 0000000000000202 0000000000000000
(XEN) 000000000000002b ffffffff81009880 0000000000000011 0000000000000202
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000

xen# gdb vmlinux.2010-01-05.6451
(no debugging symbols found)
(gdb) x/i 0xffffffff810012eb
0xffffffff810012eb <hypercall_page+747>: add %al,(%rax)


The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
to upgrade, as I don't have a serial console to this MacMini, if things go
wrong :-\

I've put the .config (make oldconfig from 2.6.32) and dmesg on:

http://nerdbynature.de/bits/2.6.33-rc2/xen/

Any ideas?

Thanks,
Christian.
--
BOFH excuse #349:

Stray Alpha Particles from memory packaging caused Hard Memory Error on Server.


2010-01-06 03:38:36

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On 01/06/2010 12:03 PM, Christian Kujau wrote:
> Hi there,
>
> a bit late with the testing again, I just found out that my Xen
> DomU won't boot with 2.6.33-rc2. The last working one is 2.6.32, I'll try
> to bisect if needed.
>
> The booting stops at:
>
> [ 0.010000] no hardware sampling interrupt available.
> [ 0.010000] Intel PMU driver.
> [ 0.010000] ... version: 2
> [ 0.010000] ... bit width: 40
> [ 0.010000] ... generic registers: 2
> [ 0.010000] ... value mask: 000000ffffffffff
> [ 0.010000] ... max period: 000000007fffffff
> [ 0.010000] ... fixed-purpose events: 3
> [ 0.010000] ... event mask: 0000000700000003
> [ 0.011314] Freeing SMP alternatives: 24k freed
>
>
> And "xm dmesg" says:
>
> xen# xm dmesg
> (XEN) traps.c:244:d88 Guest switching to user mode with no user page tables
> (XEN) traps.c:273:d88 Fatal error
>

*Really* weird. No idea how it could get into that state... I've never
seen this message before, even during development. I'd suspect either a
compiler bug, a miscompile, or some bad interaction with another patch.
A bisection would be useful.

> (XEN) domain_crash called from traps.c:274
> (XEN) Domain 88 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.2.1-rc1-pre x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e033:[<ffffffff810012eb>]
> (XEN) RFLAGS: 0000000000240246 CONTEXT: guest
> (XEN) rax: 0000000000000017 rbx: 0000000000000000 rcx: ffffffff810012eb
> (XEN) rdx: 0000000000000000 rsi: ffffffff810488a0 rdi: 0000000000000000
> (XEN) rbp: 0000000000000000 rsp: ffff88000f83df90 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000240246
> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026b0
> (XEN) cr3: 0000000096ba9000 cr2: 0000000000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
> (XEN) Guest stack trace from rsp=ffff88000f83df90:
> (XEN) 0000000000000000 0000000000000202 ffffffff81009880 0000000000000100
> (XEN) ffffffff81009880 0000000000000033 0000000000000202 0000000000000000
> (XEN) 000000000000002b ffffffff81009880 0000000000000011 0000000000000202
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> xen# gdb vmlinux.2010-01-05.6451
> (no debugging symbols found)
> (gdb) x/i 0xffffffff810012eb
> 0xffffffff810012eb<hypercall_page+747>: add %al,(%rax)
>
>
> The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
> to upgrade, as I don't have a serial console to this MacMini, if things go
> wrong :-\
>
> I've put the .config (make oldconfig from 2.6.32) and dmesg on:
>
> http://nerdbynature.de/bits/2.6.33-rc2/xen/
>


J

2010-01-06 03:48:13

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> *Really* weird. No idea how it could get into that state... I've never seen
> this message before, even during development.

I've seen it only once in the xen-devel archives, but couldn't make any
relation to my case:

http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00861.html

> I'd suspect either a compiler
> bug, a miscompile, or some bad interaction with another patch. A bisection
> would be useful.

OK, will do.

> > The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
> > to upgrade, as I don't have a serial console to this MacMini, if things go

Just out of curiosity: will this be an issue in the near future? I'm
trying to follow kernel development in my DomU, but can't upgrade Dom0 for
quite a while, so kernel versions (DomU vs Dom0) will diverge more and
more.

Thanks,
Christian,
--
BOFH excuse #74:

You're out of memory

2010-01-06 05:14:40

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On 01/06/2010 02:48 PM, Christian Kujau wrote:
>>> The Dom0 is running a 2.6.24-24-xen Ubuntu kernel amd I'm kinda reluctant
>>> to upgrade, as I don't have a serial console to this MacMini, if things go
>>>
> Just out of curiosity: will this be an issue in the near future? I'm
> trying to follow kernel development in my DomU, but can't upgrade Dom0 for
> quite a while, so kernel versions (DomU vs Dom0) will diverge more and
> more.
>

No. New domU on old dom0 or vice-versa should work indefinitely. If
not, report it as a bug.

J

2010-01-06 11:06:16

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> this message before, even during development. I'd suspect either a compiler
> bug, a miscompile, or some bad interaction with another patch.

I'm using the same compiler (gcc-4.4.2-8, binutils-2.20) for the
(working) 2.6.32 (Linus' git tree) and did a "make
distclean" to double check, but 2.6.33-rc2/3 just wouldn't boot.

> A bisection would be useful.

I'm *almost* there, only 1 or 2 revisions revisions left, I attached the
bisect log below. However, now this happens during "xm create" on the
DomU console:

------------[ cut here ]------------
WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130
noop_apic_write+0x40/0x50()
Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
Call Trace:
[<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
[<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
[<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
[<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
[<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
[<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
---[ end trace a7919e7f17c0a725 ]---


Then the DomU panics with:

[ 0.012307] Freeing SMP alternatives: 24k freed
[ 0.012398] general protection fault: 0000 [#1] SMP


Note: there's nothing in "xm dmesg" (dom0) this time, the DomU is
"running" (panicked, so unusable, but I still have to "xm destroy" it).


Also, I see the domU now crashes with "general protection fault": in the
(old) posting from xen-devel[0] they were talking about GFP as well. So
maybe it's related after all.


I've put the full dmesg on: http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/

- 3bd95dfb182969dc6d2a317c150e0df7107608d3.txt, that's what "git log"
currently says (with the git bisect log below).

- f443ff4201dd25cd4dec183f9919ecba90c8edc2.txt - this happened a few git
bisect iterations earlier, with a similar picture: "xm dmesg" was
empty, the domU panicked. Back then I did "git bisect skip", because
I had ~20 or so revisions left and it worked. The next "bad" revision
(and all the other bad revisions during the bisection) had the same
picture as my initial report.


I'm not sure how to mark the current revision and I don't know if I can
"skip" again, because I might have only one revision left.

Given that the DomU panics just after "Freeing SMP alternatives" as
v2.6.33-rc3 does it may seem that it's "bad".

I'll try to get the bisection over to a box with X11, so that
I can use "git bisect visualize" to see what revisions are left (or is
there an easier way to find out?)


But maybe that's close enough to get an idea what's going on here?

Christian.

[0] http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00861.html


git-bisect start
# bad: [74d2e4f8d79ae0c4b6ec027958d5b18058662eea] Linux 2.6.33-rc3
git-bisect bad 74d2e4f8d79ae0c4b6ec027958d5b18058662eea
# good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git-bisect good 22763c5cf3690a681551162c15d34d935308c8d7
# good: [6825fbc4cb219f2c98bb7d157915d797cf5cb823] Merge branch 'next-i2c' of git://git.fluff.org/bjdooks/linux
git-bisect good 6825fbc4cb219f2c98bb7d157915d797cf5cb823
# good: [471452104b8520337ae2fb48c4e61cd4896e025d] const: constify remaining dev_pm_ops
git-bisect good 471452104b8520337ae2fb48c4e61cd4896e025d
# bad: [288f02bbb6e9609cbaf1eb7a9cb97ae45ce090b2] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
git-bisect bad 288f02bbb6e9609cbaf1eb7a9cb97ae45ce090b2
# good: [60d9aa758c00f20ade0cb1951f6a934f628dd2d7] Merge git://git.infradead.org/mtd-2.6
git bisect good 60d9aa758c00f20ade0cb1951f6a934f628dd2d7
# good: [525995d77ca08dfc2ba6f8e606f93694271dbd66] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin
git bisect good 525995d77ca08dfc2ba6f8e606f93694271dbd66
# bad: [8aedf8a6ae98d5d4df3254b6afb7e4432d9d8600] Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 8aedf8a6ae98d5d4df3254b6afb7e4432d9d8600
# bad: [bac5e54c29f352d962a2447d22735316b347b9f1] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
git bisect bad bac5e54c29f352d962a2447d22735316b347b9f1
# bad: [61ecdb84c1f05ad445db4584ae375a15c0e8ae47] Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect bad 61ecdb84c1f05ad445db4584ae375a15c0e8ae47
# good: [e36c54582c6f14adc9e10473e2aec2cc4f0acc03] tracing: Fix return of trace_dump_stack()
git bisect good e36c54582c6f14adc9e10473e2aec2cc4f0acc03
# skip: [e840227c141116171c89ab1abb5cc9fee6fdb488] x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
git bisect skip e840227c141116171c89ab1abb5cc9fee6fdb488
# good: [27f59559d63375a4d59e7c720a439d9f0b47edad] x86: Merge sys_iopl
git bisect good 27f59559d63375a4d59e7c720a439d9f0b47edad
# bad: [f443ff4201dd25cd4dec183f9919ecba90c8edc2] x86: Sync 32/64-bit kernel_thread
git bisect bad f443ff4201dd25cd4dec183f9919ecba90c8edc2
# good: [ce9119ad90b1caba550447bfcc0a21850558ca49] x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
git bisect good ce9119ad90b1caba550447bfcc0a21850558ca49
--
BOFH excuse #204:

Just pick up the phone and give modem connect sounds. "Well you said we should get more lines so we don't have voice lines."

2010-01-06 11:21:40

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Wed, Jan 06, 2010 at 03:06:05AM -0800, Christian Kujau wrote:
> On Wed, 6 Jan 2010 at 14:38, Jeremy Fitzhardinge wrote:
> > this message before, even during development. I'd suspect either a compiler
> > bug, a miscompile, or some bad interaction with another patch.
>
> I'm using the same compiler (gcc-4.4.2-8, binutils-2.20) for the
> (working) 2.6.32 (Linus' git tree) and did a "make
> distclean" to double check, but 2.6.33-rc2/3 just wouldn't boot.
>
> > A bisection would be useful.
>
> I'm *almost* there, only 1 or 2 revisions revisions left, I attached the
> bisect log below. However, now this happens during "xm create" on the
> DomU console:
>
> ------------[ cut here ]------------
> WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130
> noop_apic_write+0x40/0x50()
> Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
> Call Trace:
> [<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
> [<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
> [<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
> [<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
> [<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
> [<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
> [<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
> ---[ end trace a7919e7f17c0a725 ]---
>
...

This one should be fixed by the commit 125580380f418000b1a06d9a54700f1191b6e561
I believe.

-- Cyrill

2010-01-06 12:44:17

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Wed, 6 Jan 2010 at 14:21, Cyrill Gorcunov wrote:
> > ------------[ cut here ]------------
> > WARNING: at /mnt/d1/linux-2.6-git/arch/x86/kernel/apic/apic_noop.c:130
> > noop_apic_write+0x40/0x50()
> > Modules linked in:Pid: 0, comm: swapper Not tainted 2.6.32 #1
> > Call Trace:
> > [<ffffffff81032563>] ? warn_slowpath_common+0x73/0xb0
> > [<ffffffff8101a7c0>] ? noop_apic_write+0x40/0x50
> > [<ffffffff81334160>] ? init_hw_perf_events+0x33d/0x3dd
> > [<ffffffff8100622f>] ? xen_restore_fl_direct_end+0x0/0x1
> > [<ffffffff81333cab>] ? identify_boot_cpu+0x15/0x3e
> > [<ffffffff81333dfe>] ? check_bugs+0x9/0x2e
> > [<ffffffff8132ec6e>] ? start_kernel+0x324/0x334
> > ---[ end trace a7919e7f17c0a725 ]---
> >
> This one should be fixed by the commit 125580380f418000b1a06d9a54700f1191b6e561
> I believe.

Thanks, so within this particular bisection that would mean it's a "good"
revision - it won't but because it doesn't have this fix, but it's not the
same the initial problem.

I've run a few more bisections and this is where I have arrived now:

http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/git-bisect_finished.log

...with the last iteration being:


# git bisect good
3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
Author: Brian Gerst <[email protected]>
Date: Wed Dec 9 12:34:40 2009 -0500

x86, 64-bit: Move kernel_thread to C

Prepare for merging with 32-bit.

Signed-off-by: Brian Gerst <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>

:040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978
0bb5fb33914aac10aaf0344fb8cff596378be52a M arch


@Brian, hpa: I've Cc'ed you on this one, here's what I'm whining about:
http://lkml.org/lkml/2010/1/5/489


Please let me know if this makes sense or if the bisection looks
funny/invalid.

Thanks,
Christian.
--
BOFH excuse #353:

Second-system effect.

2010-01-07 19:06:30

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Wed, 6 Jan 2010 at 04:43, Christian Kujau wrote:
> # git bisect good
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> Author: Brian Gerst <[email protected]>
> Date: Wed Dec 9 12:34:40 2009 -0500
>
> x86, 64-bit: Move kernel_thread to C
>
> Prepare for merging with 32-bit.
>
> Signed-off-by: Brian Gerst <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978
> 0bb5fb33914aac10aaf0344fb8cff596378be52a M arch
>

ping?


I'd like to revert all of Brians commits from Dec 9 12:34:4[0-4] -0500,
one by one, but:

3bd95dfb182969dc6d2a317c150e0df7107608d3 - when reverted, it won't compile
fa4b8f84383ae197e643a46c36bf58ab8dffc95c - but now I cannot revert all the
e840227c141116171c89ab1abb5cc9fee6fdb488 others, git won't let me.
f443ff4201dd25cd4dec183f9919ecba90c8edc2
df59e7bf439918f523ac29e996ec1eebbed60440

I'm pretty much offline for a week now, I just hope this won't get
forgotten before 2.6.33 is released.

Thanks,
Christian.
--
BOFH excuse #191:

Just type 'mv * /dev/null'.

2010-01-07 19:20:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On 01/06/2010 04:43 AM, Christian Kujau wrote:
>
> I've run a few more bisections and this is where I have arrived now:
>
> http://nerdbynature.de/bits/2.6.33-rc2/xen/bisect/git-bisect_finished.log
>
> ...with the last iteration being:
>
>
> # git bisect good
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> Author: Brian Gerst <[email protected]>
> Date: Wed Dec 9 12:34:40 2009 -0500
>
> x86, 64-bit: Move kernel_thread to C
>
> Prepare for merging with 32-bit.
>
> Signed-off-by: Brian Gerst <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: H. Peter Anvin <[email protected]>
>
> :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978
> 0bb5fb33914aac10aaf0344fb8cff596378be52a M arch
>
>
> @Brian, hpa: I've Cc'ed you on this one, here's what I'm whining about:
> http://lkml.org/lkml/2010/1/5/489
>
>
> Please let me know if this makes sense or if the bisection looks
> funny/invalid.
>

The big difference between the code before and after this commit is that
before, kernel_thread() would initialize the pt_regs structure with
whatever state happened to be passed into it by the caller, whereas
afterwards it is initialized to zero. It's unclear to me why that would
break Xen, but therein lies the problem with paravirtualization... it's
not actually running the same thing the real architecture.

-hpa

2010-01-07 19:20:30

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Thu, Jan 07, 2010 at 11:06:15AM -0800, Christian Kujau wrote:
> On Wed, 6 Jan 2010 at 04:43, Christian Kujau wrote:
> > # git bisect good
> > 3bd95dfb182969dc6d2a317c150e0df7107608d3 is the first bad commit
> > commit 3bd95dfb182969dc6d2a317c150e0df7107608d3
> > Author: Brian Gerst <[email protected]>
> > Date: Wed Dec 9 12:34:40 2009 -0500
> >
> > x86, 64-bit: Move kernel_thread to C
> >
> > Prepare for merging with 32-bit.
> >
> > Signed-off-by: Brian Gerst <[email protected]>
> > LKML-Reference: <[email protected]>
> > Signed-off-by: H. Peter Anvin <[email protected]>
> >
> > :040000 040000 30b5dd4d6888694ca2967893ef3e662461fe9978
> > 0bb5fb33914aac10aaf0344fb8cff596378be52a M arch
> >
>
> ping?
>
>
> I'd like to revert all of Brians commits from Dec 9 12:34:4[0-4] -0500,
> one by one, but:
>
> 3bd95dfb182969dc6d2a317c150e0df7107608d3 - when reverted, it won't compile
> fa4b8f84383ae197e643a46c36bf58ab8dffc95c - but now I cannot revert all the
> e840227c141116171c89ab1abb5cc9fee6fdb488 others, git won't let me.
> f443ff4201dd25cd4dec183f9919ecba90c8edc2
> df59e7bf439918f523ac29e996ec1eebbed60440
>
> I'm pretty much offline for a week now, I just hope this won't get
> forgotten before 2.6.33 is released.
>
> Thanks,
> Christian.
> --
> BOFH excuse #191:
>
> Just type 'mv * /dev/null'.

Hi Christian,

for the first "guilty" commit -- it seems to be innocent, at least
from the way how kernel_thread is converted into "C". Need time for
more detailed review :/

-- Cyrill

2010-01-07 19:31:45

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Thu, 7 Jan 2010 at 22:20, Cyrill Gorcunov wrote:
> for the first "guilty" commit -- it seems to be innocent, at least
> from the way how kernel_thread is converted into "C". Need time for
> more detailed review :/

OK, thanks for looking into this.

Christian.
--
BOFH excuse #429:

Temporal anomaly

2010-01-07 19:32:10

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
> The big difference between the code before and after this commit is that
> before, kernel_thread() would initialize the pt_regs structure with
> whatever state happened to be passed into it by the caller, whereas
> afterwards it is initialized to zero.

To be honest, bisection was kinda hazy in the last step (see my previous
mails), but from looking at the bisection log, it's definitely one of
your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the
other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't
start at all. With the only other patches applied, the DomU would start,
and then die with a GPF.

Christian.
--
BOFH excuse #191:

Just type 'mv * /dev/null'.

2010-01-07 19:34:16

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Thu, Jan 07, 2010 at 11:31:41AM -0800, Christian Kujau wrote:
> On Thu, 7 Jan 2010 at 22:20, Cyrill Gorcunov wrote:
> > for the first "guilty" commit -- it seems to be innocent, at least
> > from the way how kernel_thread is converted into "C". Need time for
> > more detailed review :/
>
> OK, thanks for looking into this.
>
> Christian.
> --
> BOFH excuse #429:
>
> Temporal anomaly
>

Well, Peter is much more experienced in this area and as he
already pointed -- pt_regs now zero'ed with new kernel_thread...

-- Cyrill

2010-01-08 21:50:44

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Thu, Jan 07, 2010 at 11:30:46AM -0800, Christian Kujau wrote:
> On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
> > The big difference between the code before and after this commit is that
> > before, kernel_thread() would initialize the pt_regs structure with
> > whatever state happened to be passed into it by the caller, whereas
> > afterwards it is initialized to zero.
>
> To be honest, bisection was kinda hazy in the last step (see my previous
> mails), but from looking at the bisection log, it's definitely one of
> your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the
> other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't
> start at all. With the only other patches applied, the DomU would start,
> and then die with a GPF.
>
> Christian.
> --
> BOFH excuse #191:
>
> Just type 'mv * /dev/null'.
>

OK, perhaps the patch below is not _that_ stupid so I
would like to get it reviewed and tested if possible.
Just a thought. Wonder if it help but definitely it will
not harm anyway :)

-- Cyrill
---
x86: kernel_thread -- initialize SS to a known state

Before the kernel_thread was converted into "C" we had
pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).

Though I must admit I didn't find any *explicit* load of
%ss from this structure the better to be on a safe side
and set it to a known value.

Signed-off-by: Cyrill Gorcunov <[email protected]>
---
arch/x86/kernel/process.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux-2.6.git/arch/x86/kernel/process.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/process.c
+++ linux-2.6.git/arch/x86/kernel/process.c
@@ -288,6 +288,8 @@ int kernel_thread(int (*fn)(void *), voi
regs.es = __USER_DS;
regs.fs = __KERNEL_PERCPU;
regs.gs = __KERNEL_STACK_CANARY;
+#else
+ regs.ss = __KERNEL_DS;
#endif

regs.orig_ax = -1;

2010-01-09 23:56:58

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Fri, January 8, 2010 13:50, Cyrill Gorcunov wrote:
> Wonder if it help but definitely it will
> not harm anyway :)

Thanks (again) Cyrill, I'll test your patch late next week, as I'm
travelling right now and don't have access to the Xen box. I can't
wait.....

Christian.
--
BOFH excuse #442:

Trojan horse ran out of hay

2010-01-10 01:50:09

by Brian Gerst

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Fri, Jan 8, 2010 at 4:50 PM, Cyrill Gorcunov <[email protected]> wrote:
> On Thu, Jan 07, 2010 at 11:30:46AM -0800, Christian Kujau wrote:
>> On Thu, 7 Jan 2010 at 11:19, H. Peter Anvin wrote:
>> > The big difference between the code before and after this commit is that
>> > before, kernel_thread() would initialize the pt_regs structure with
>> > whatever state happened to be passed into it by the caller, whereas
>> > afterwards it is initialized to zero.
>>
>> To be honest, bisection was kinda hazy in the last step (see my previous
>> mails), but from looking at the bisection log, it's definitely one of
>> your/Brians commit (sorry!), so it may be 3bd95dfb in combination with the
>> other 4 changes. However, only with 3bd95dfb applied, the DomU wouldn't
>> start at all. With the only other patches applied, the DomU would start,
>> and then die with a GPF.
>>
>> Christian.
>> --
>> BOFH excuse #191:
>>
>> Just type 'mv * /dev/null'.
>>
>
> OK, perhaps the patch below is not _that_ stupid so I
> would like to get it reviewed and tested if possible.
> Just a thought. Wonder if it help but definitely it will
> not harm anyway :)
>
>        -- Cyrill
> ---
> x86: kernel_thread -- initialize SS to a known state
>
> Before the kernel_thread was converted into "C" we had
> pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
>
> Though I must admit I didn't find any *explicit* load of
> %ss from this structure the better to be on a safe side
> and set it to a known value.

It shouldn't make any difference, but maybe Xen is doing something
subtle. In 64-bit mode the %ss segment register is supposed to be
ignored, which is why it is left set to zero. It works properly on
real hardware. It can't hurt anything to put __KERNEL_DS back in, but
I'd just like to know why Xen requires it if this does fix it.

--
Brian Gerst

2010-01-10 08:09:46

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
...
> > ---
> > x86: kernel_thread -- initialize SS to a known state
> >
> > Before the kernel_thread was converted into "C" we had
> > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> >
> > Though I must admit I didn't find any *explicit* load of
> > %ss from this structure the better to be on a safe side
> > and set it to a known value.
>
> It shouldn't make any difference, but maybe Xen is doing something
> subtle. In 64-bit mode the %ss segment register is supposed to be
> ignored, which is why it is left set to zero. It works properly on
> real hardware. It can't hurt anything to put __KERNEL_DS back in, but
> I'd just like to know why Xen requires it if this does fix it.

Yeah, I didn't found any explicit %ss reloading for this _particular_
case (as I marked in patch changelog). So the only suspicious is Xen
itself. So as only Christian get ability to test -- we will see the
results.

>
> --
> Brian Gerst
>
-- Cyrill

2010-01-10 13:00:25

by Ian Campbell

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sun, 2010-01-10 at 11:09 +0300, Cyrill Gorcunov wrote:
> On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
> ...
> > > ---
> > > x86: kernel_thread -- initialize SS to a known state
> > >
> > > Before the kernel_thread was converted into "C" we had
> > > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> > >
> > > Though I must admit I didn't find any *explicit* load of
> > > %ss from this structure the better to be on a safe side
> > > and set it to a known value.
> >
> > It shouldn't make any difference, but maybe Xen is doing something
> > subtle. In 64-bit mode the %ss segment register is supposed to be
> > ignored, which is why it is left set to zero. It works properly on
> > real hardware. It can't hurt anything to put __KERNEL_DS back in, but
> > I'd just like to know why Xen requires it if this does fix it.
>
> Yeah, I didn't found any explicit %ss reloading for this _particular_
> case (as I marked in patch changelog). So the only suspicious is Xen
> itself. So as only Christian get ability to test -- we will see the
> results.

The difference with Xen is that it must squash the RPL of SS (to 3 for
64 bit and 1 for 32 bit, 32 bit doesn't matter here though). Perhaps a
NULL selector can only have RPL==0? (I'm away from my architecture docs
so I can't check). In any case specifying a non-NULL SS selector allows
the squashing to occur correctly.

However this is not the cause of the original "Guest switching to user
mode with no user page tables" error. This is down to
commit f443ff4201dd25cd4dec183f9919ecba90c8edc2
Author: Brian Gerst <[email protected]>
Date: Wed Dec 9 12:34:43 2009 -0500

x86: Sync 32/64-bit kernel_thread

Signed-off-by: Brian Gerst <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
which on 64 bit resulted in changing regs.cs from "__KERNEL_CS" to
"__KERNEL_CS | get_kernel_rpl()". The later seems more logical (and is
correct for 32 bit) but on 64 bit we frequently use a pattern like "cmpl
$3, CS(%rsp); je foo" quite a bit to detect return to user vs kernel and
an RPL of 1 will unfortunately incorrectly trigger the return to
userspace paths.

The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
bit guests -- the hyervisor already takes care of all the necessary
squashing to ring 3 transparently (because making the guest worry about
it would break the very common assumption that you can distinguish user
from kernel CS by RPL).

With just the CS RPL fix below I see a GPF at kernel_thread_helper with
SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
With both the SS and CS fixes things work fine.

Ian.

---
Subject: xen: 64 bit kernel RPL should be 0.

Under Xen 64 bit guests actually run their kernel in ring 3, however the
hypervisor takes care of squashing descriptor the RPLs transparently (in
order to allow them to continue to differentiate between user and kernel
space CS using the RPL). Therefore the Xen paravirt backend should use
RPL==0 instead of 1 (or 3). Using RPL==1 causes generic arch code to
take incorrect code paths because it uses "testl $3, <CS>, je foo" type
tests for a userspace CS and this considers 1==userspace.

This issue was previously masked because get_kernel_rpl() was omitted
when setting CS in kernel_thread(). This was fixed when kernel_thread()
was unified with 32 bit in f443ff4201dd25cd4dec183f9919ecba90c8edc2.

Signed-off-by: Ian Campbell <[email protected]>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 2b26dd5..36daccb 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1151,9 +1151,13 @@ asmlinkage void __init xen_start_kernel(void)

/* keep using Xen gdt for now; no urgent need to change it */

+#ifdef CONFIG_X86_32
pv_info.kernel_rpl = 1;
if (xen_feature(XENFEAT_supervisor_mode_kernel))
pv_info.kernel_rpl = 0;
+#else
+ pv_info.kernel_rpl = 0;
+#endif

/* set the limit of our address space */
xen_reserve_top();


--
Ian Campbell

BOFH excuse #430:

Mouse has out-of-cheese-error

2010-01-10 13:36:33

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sun, Jan 10, 2010 at 12:59:03PM +0000, Ian Campbell wrote:
> On Sun, 2010-01-10 at 11:09 +0300, Cyrill Gorcunov wrote:
> > On Sat, Jan 09, 2010 at 08:50:04PM -0500, Brian Gerst wrote:
> > ...
> > > > ---
> > > > x86: kernel_thread -- initialize SS to a known state
> > > >
> > > > Before the kernel_thread was converted into "C" we had
> > > > pt_regs::ss set to __KERNEL_DS (by SAVE_ALL asm macro).
> > > >
> > > > Though I must admit I didn't find any *explicit* load of
> > > > %ss from this structure the better to be on a safe side
> > > > and set it to a known value.
> > >
> > > It shouldn't make any difference, but maybe Xen is doing something
> > > subtle. In 64-bit mode the %ss segment register is supposed to be
> > > ignored, which is why it is left set to zero. It works properly on
> > > real hardware. It can't hurt anything to put __KERNEL_DS back in, but
> > > I'd just like to know why Xen requires it if this does fix it.
> >
> > Yeah, I didn't found any explicit %ss reloading for this _particular_
> > case (as I marked in patch changelog). So the only suspicious is Xen
> > itself. So as only Christian get ability to test -- we will see the
> > results.
>
> The difference with Xen is that it must squash the RPL of SS (to 3 for
> 64 bit and 1 for 32 bit, 32 bit doesn't matter here though). Perhaps a
> NULL selector can only have RPL==0? (I'm away from my architecture docs
> so I can't check). In any case specifying a non-NULL SS selector allows
> the squashing to occur correctly.
>
> However this is not the cause of the original "Guest switching to user
> mode with no user page tables" error. This is down to
> commit f443ff4201dd25cd4dec183f9919ecba90c8edc2
> Author: Brian Gerst <[email protected]>
> Date: Wed Dec 9 12:34:43 2009 -0500
>
> x86: Sync 32/64-bit kernel_thread
>
> Signed-off-by: Brian Gerst <[email protected]>
> LKML-Reference: <[email protected]>
> Signed-off-by: H. Peter Anvin <[email protected]>
> which on 64 bit resulted in changing regs.cs from "__KERNEL_CS" to
> "__KERNEL_CS | get_kernel_rpl()". The later seems more logical (and is
> correct for 32 bit) but on 64 bit we frequently use a pattern like "cmpl
> $3, CS(%rsp); je foo" quite a bit to detect return to user vs kernel and
> an RPL of 1 will unfortunately incorrectly trigger the return to
> userspace paths.
>
> The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> bit guests -- the hyervisor already takes care of all the necessary
> squashing to ring 3 transparently (because making the guest worry about
> it would break the very common assumption that you can distinguish user
> from kernel CS by RPL).
>
> With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> With both the SS and CS fixes things work fine.

any of CS,SS loaded with NULL descriptor should lead to #GP

>
> Ian.
>
> ---
> Subject: xen: 64 bit kernel RPL should be 0.
>
...

Good catch Ian! I've noted that Xen use it's own get_kernel_rpl
while discussing this problem in a chat. But I must admit *I simply don't know*
what Xen does, or how it works internally (neither I have will to learn it at
moment :)

That said -- I'm happy if yor patch fixes problem (and it looks that
get_kernel_rpl is guilty here indeed).

-- Cyrill

2010-01-10 13:49:55

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sun, Jan 10, 2010 at 04:36:28PM +0300, Cyrill Gorcunov wrote:
...
> >
> > With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> > SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> > With both the SS and CS fixes things work fine.
>
> any of CS,SS loaded with NULL descriptor should lead to #GP
>

though SS with RPL=0 is allowed to be NULL descriptor in 64bit mode

> >
> > Ian.
> >
> > ---
> > Subject: xen: 64 bit kernel RPL should be 0.
> >
> ...
>
> Good catch Ian! I've noted that Xen use it's own get_kernel_rpl
> while discussing this problem in a chat. But I must admit *I simply don't know*
> what Xen does, or how it works internally (neither I have will to learn it at
> moment :)
>
> That said -- I'm happy if yor patch fixes problem (and it looks that
> get_kernel_rpl is guilty here indeed).
>
> -- Cyrill

-- Cyrill

2010-01-10 14:11:22

by Ian Campbell

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sun, 2010-01-10 at 16:49 +0300, Cyrill Gorcunov wrote:
> On Sun, Jan 10, 2010 at 04:36:28PM +0300, Cyrill Gorcunov wrote:
> ...
> > >
> > > With just the CS RPL fix below I see a GPF at kernel_thread_helper with
> > > SS=3 (hence my hypothesis about NULL selectors and non-zero RPL above).
> > > With both the SS and CS fixes things work fine.
> >
> > any of CS,SS loaded with NULL descriptor should lead to #GP
> >
>
> though SS with RPL=0 is allowed to be NULL descriptor in 64bit mode

yes, that's what I meant.

Ian.

--
Ian Campbell

Tussman's Law:
Nothing is as inevitable as a mistake whose time has come.

2010-01-15 08:38:07

by Christian Kujau

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> bit guests -- the hyervisor already takes care of all the necessary
> squashing to ring 3 transparently (because making the guest worry about
> it would break the very common assumption that you can distinguish user
> from kernel CS by RPL).

Yes' it's a 64bit guest, I should have mentioned this from the beginning.
With the 2 patches from Ian and Cyrill applied, the DomU is now booting
fine again (currently running mainline -git).

Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
working with both patches applied, should I try to test with only Ian's
patch (for arch/x86/xen/enlighten.c) applied?

In any case, feel free to add:

Tested-by: Christian Kujau <[email protected]>

Thanks so much for your efforts to everyone involved!

Christian.
--
BOFH excuse #436:

Daemon escaped from pentagram

2010-01-15 11:29:27

by Ian Campbell

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Fri, 2010-01-15 at 00:36 -0800, Christian Kujau wrote:
> On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> > The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> > bit guests -- the hyervisor already takes care of all the necessary
> > squashing to ring 3 transparently (because making the guest worry about
> > it would break the very common assumption that you can distinguish user
> > from kernel CS by RPL).
>
> Yes' it's a 64bit guest, I should have mentioned this from the beginning.

That's OK, I already knew because only 64 bit guests have a separate
user page table.

> With the 2 patches from Ian and Cyrill applied, the DomU is now booting
> fine again (currently running mainline -git).

Excellent. These patches are both now in -tip. They are in the urgent
branch so I assume they will be heading to mainline before too long.

> Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> working with both patches applied, should I try to test with only Ian's
> patch (for arch/x86/xen/enlighten.c) applied?

It's OK, both patches are definitely required to fix 64 bit guests so
there is no point in testing just one or the other.

Ian.

--
Ian Campbell
Current Noise: Exodus - Scar Spangled Banner

War is much too serious a matter to be entrusted to the military.
-- Clemenceau

2010-01-15 12:00:40

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Fri, Jan 15, 2010 at 12:36:42AM -0800, Christian Kujau wrote:
> On Sun, 10 Jan 2010 at 12:59, Ian Campbell wrote:
> > The correct fix is for the Xen backend to declare kernel RPL == 0 for 64
> > bit guests -- the hyervisor already takes care of all the necessary
> > squashing to ring 3 transparently (because making the guest worry about
> > it would break the very common assumption that you can distinguish user
> > from kernel CS by RPL).
>
> Yes' it's a 64bit guest, I should have mentioned this from the beginning.
> With the 2 patches from Ian and Cyrill applied, the DomU is now booting
> fine again (currently running mainline -git).
>
> Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> working with both patches applied, should I try to test with only Ian's
> patch (for arch/x86/xen/enlighten.c) applied?
>
> In any case, feel free to add:
>
> Tested-by: Christian Kujau <[email protected]>
>
> Thanks so much for your efforts to everyone involved!
>
> Christian.
> --
> BOFH excuse #436:
>
> Daemon escaped from pentagram
>

Well, I think the Ian's patch is a key here and mine should be
droppped then. Thanks for testing!

-- Cyrill

2010-01-15 12:03:14

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: 2.6.33-rc2: Xen/Guest switching to user mode with no user page tables

On Fri, Jan 15, 2010 at 11:29:10AM +0000, Ian Campbell wrote:
...
>
> > Cyrill: with your patch alone (for arch/x86/kernel/process.c), the DomU
> > is still not booting, Dom0 "xm dmesg" reporting the same error. As it's
> > working with both patches applied, should I try to test with only Ian's
> > patch (for arch/x86/xen/enlighten.c) applied?
>
> It's OK, both patches are definitely required to fix 64 bit guests so
> there is no point in testing just one or the other.
>

ah, ok, so be it. Thanks!

> Ian.
>
> --
> Ian Campbell
> Current Noise: Exodus - Scar Spangled Banner
>
> War is much too serious a matter to be entrusted to the military.
> -- Clemenceau
>
-- Cyrill