2008-08-31 17:26:59

by Bernhard Schmidt

[permalink] [raw]
Subject: 2.6.27-rc xen pvops regression?

Hello,

I've been a happy user of vanilla 2.6.26 with pvops in my Xen domU for
a couple of months. I wanted to test 2.6.27-rc (-rc3 a couple of weeks
ago, -rc5 just now) to verify another issue and ran into the following
error

Using config file "/etc/xen/miredo2.cfg".
Started domain miredo2
0: 0x00000395 -> 0x00010000
Using APIC driver default
Could not enable APIC!
Allocating PCI resources starting at 20000000 (gap: 10000000:f0000000)
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64861
Kernel command line: root=/dev/xvda1 ro 3 console=xvc0 lapic
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 1024 (order: 10, 4096 bytes)
Detected 2800.098 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
console [hvc0] enabled
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 256164k/262144k available (1270k kernel code, 5152k reserved, 496k data, 156k init, 0k highmem)
virtual kernel memory layout:
fixmap : 0xf57a9000 - 0xf57fd000 ( 336 kB)
pkmap : 0xf5200000 - 0xf5400000 (2048 kB)
vmalloc : 0xd0800000 - 0xf51fe000 ( 585 MB)
lowmem : 0xc0000000 - 0xd0000000 ( 256 MB)
.init : 0xc02bc000 - 0xc02e3000 ( 156 kB)
.data : 0xc023dbba - 0xc02b9fa0 ( 496 kB)
.text : 0xc0100000 - 0xc023dbba (1270 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=12, HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
installing Xen timer for CPU 0
Calibrating delay loop (skipped), value calculated using timer frequency.. 5600.19 BogoMIPS (lpj=11200392)
Mount-cache hash table entries: 512
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz stepping 04
Freeing SMP alternatives: 0k freed
initcall spawn_ksoftirqd+0x0/0x30 returned with disabled interrupts
initcall spawn_softlockup_task+0x0/0x48 returned with disabled interrupts
initcall relay_init+0x0/0x3 returned with disabled interrupts
net_namespace: 488 bytes
initcall net_ns_init+0x0/0x106 returned with disabled interrupts
initcall reboot_init+0x0/0xd returned with disabled interrupts
Booting paravirtualized kernel on Xen
Xen version: 3.2.1-rc1-pre
initcall print_banner+0x0/0x9 returned with disabled interrupts
initcall sysctl_init+0x0/0x24 returned with disabled interrupts
initcall ksysfs_init+0x0/0x92 returned with disabled interrupts
initcall init_jiffies_clocksource+0x0/0xa returned with disabled interrupts
initcall filelock_init+0x0/0x25 returned with disabled interrupts
initcall init_script_binfmt+0x0/0xa returned with disabled interrupts
initcall init_elf_binfmt+0x0/0xa returned with disabled interrupts
initcall random32_init+0x0/0x81 returned with disabled interrupts
Grant table initialized
initcall gnttab_init+0x0/0x168 returned with disabled interrupts
initcall sock_init+0x0/0x4a returned with disabled interrupts
NET: Registered protocol family 16
initcall netlink_proto_init+0x0/0xf8 returned with disabled interrupts
nitcall bdi_class_init+0x0/0x20 returned with disabled interrupts
initcall kobject_uevent_init+0x0/0x3b returned with disabled interrupts

and freeze. The base system is Ubuntu hardy on i386, Xen 3.2.1-rc1 as
packaged.

Please find my .config attached. Basically the very same .config ran fine
with 2.6.26 (I copied it over and ran "make oldconfig").

Bernhard


Attachments:
(No filename) (3.44 kB)
config-2.6.27-rc5-pvops (20.38 kB)
Download all attachments

2008-08-31 22:12:43

by Alex Nixon

[permalink] [raw]
Subject: Re: 2.6.27-rc xen pvops regression?

Bernhard Schmidt wrote:
> Hello,
>
> I've been a happy user of vanilla 2.6.26 with pvops in my Xen domU for
> a couple of months. I wanted to test 2.6.27-rc (-rc3 a couple of weeks
> ago, -rc5 just now) to verify another issue and ran into the following
> error
>
Thanks Bernhard - I'm able to reproduce it with your config.

I've had a look, and it seems the register_vcpu_info hypercall is being
made twice, failing on the second time. In fact,
paravirt_pagetable_setup_{start,done} are both being called twice - once
from arch/x86/kernel/setup.c, and then again from arch/x86/mm/init_32.c

I suspect one of them should go, but I'm not familiar enough with this
code to suggest which one.

But still with that crudely resolved, the kernel BUGs (see below).
Jeremy - any ideas?

- Alex


`xm dmesg':

(XEN) mm.c:2249:d168 Mfn 24384 not pinned

dmesg:

EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 160k freed
------------[ cut here ]------------
kernel BUG at
/local/scratch/hotplug.linux.trees.git/arch/x86/xen/enlighten.c:850!
invalid opcode: 0000 [#1]

Pid: 1, comm: init Tainted: G W
(2.6.27-rc5-tip-00672-ge5c5407-dirty #52)
EIP: 0061:[<c0102b04>] EFLAGS: 00010282 CPU: 0
EIP is at pin_pagetable_pfn+0x3f/0x4b
EAX: ffffffea EBX: e242dd74 ECX: 00000001 EDX: 00000000
ESI: 00007ff0 EDI: 00000000 EBP: e242dd8c ESP: e242dd74
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: e021
Process init (pid: 1, ti=e242c000 task=e242a000 task.ti=e242c000)
Stack: 00000004 00024384 00000000 c144e8e0 00022747 00000000 e242dd9c
c0102ee8
c02fe5d4 c144e8e0 e242dda4 c0102f23 e242ddbc c0116722 00000000
00000000
00000000 00000000 e242de2c c01467a8 00000000 c1000000 00000000
22747067
Call Trace:
[<c0102ee8>] ? xen_release_ptpage+0x52/0x71
[<c0102f23>] ? xen_release_pte+0xd/0xf
[<c0116722>] ? __pte_free_tlb+0x35/0x4b
[<c01467a8>] ? free_pgd_range+0x377/0x38c
[<c015a4bf>] ? setup_arg_pages+0x1b8/0x22b
[<c017a80f>] ? load_elf_binary+0x3f1/0x10a4
[<c0104ffa>] ? check_events+0x8/0xe
[<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
[<c0104937>] ? xen_force_evtchn_callback+0xf/0x14
[<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
[<c014298a>] ? page_address+0x85/0xa2
[<c0159c51>] ? put_arg_page+0x8/0xa
[<c0159e27>] ? copy_strings+0x156/0x160
[<c0159ea8>] ? search_binary_handler+0x42/0x10a
[<c015ad64>] ? do_execve+0x129/0x192
[<c0105354>] ? sys_execve+0x29/0x50
[<c0106b1e>] ? syscall_call+0x7/0xb
[<c0109b2f>] ? kernel_execve+0x17/0x1c
[<c0102140>] ? run_init_process+0x17/0x19
[<c01021e3>] ? init_post+0xa1/0xf2
[<c01077df>] ? kernel_thread_helper+0x7/0x10

2008-08-31 23:01:35

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: 2.6.27-rc xen pvops regression?

Alex Nixon wrote:
> Bernhard Schmidt wrote:
>> Hello,
>>
>> I've been a happy user of vanilla 2.6.26 with pvops in my Xen domU for
>> a couple of months. I wanted to test 2.6.27-rc (-rc3 a couple of weeks
>> ago, -rc5 just now) to verify another issue and ran into the following
>> error
>>
> Thanks Bernhard - I'm able to reproduce it with your config.

Is this the crash with the UP config?

> I've had a look, and it seems the register_vcpu_info hypercall is
> being made twice, failing on the second time. In fact,
> paravirt_pagetable_setup_{start,done} are both being called twice -
> once from arch/x86/kernel/setup.c, and then again from
> arch/x86/mm/init_32.c

Hm, yes. Looks like a unifcation side-effect, though I thought I'd run
a 32-bit kernel recently enough to notice this. Anyway, I think the
init_32.c calls should be redundant.

> I suspect one of them should go, but I'm not familiar enough with this
> code to suggest which one.
>
> But still with that crudely resolved, the kernel BUGs (see below).
> Jeremy - any ideas?
>
> - Alex
>
>
> `xm dmesg':
>
> (XEN) mm.c:2249:d168 Mfn 24384 not pinned

Bit hard to know what this means without knowing what mfn 24384 is.

J

>
> dmesg:
>
> EXT3-fs: mounted filesystem with ordered data mode.
> VFS: Mounted root (ext3 filesystem) readonly.
> Freeing unused kernel memory: 160k freed
> ------------[ cut here ]------------
> kernel BUG at
> /local/scratch/hotplug.linux.trees.git/arch/x86/xen/enlighten.c:850!
> invalid opcode: 0000 [#1]
>
> Pid: 1, comm: init Tainted: G W
> (2.6.27-rc5-tip-00672-ge5c5407-dirty #52)
> EIP: 0061:[<c0102b04>] EFLAGS: 00010282 CPU: 0
> EIP is at pin_pagetable_pfn+0x3f/0x4b
> EAX: ffffffea EBX: e242dd74 ECX: 00000001 EDX: 00000000
> ESI: 00007ff0 EDI: 00000000 EBP: e242dd8c ESP: e242dd74
> DS: 007b ES: 007b FS: 0000 GS: 0000 SS: e021
> Process init (pid: 1, ti=e242c000 task=e242a000 task.ti=e242c000)
> Stack: 00000004 00024384 00000000 c144e8e0 00022747 00000000 e242dd9c
> c0102ee8
> c02fe5d4 c144e8e0 e242dda4 c0102f23 e242ddbc c0116722 00000000
> 00000000
> 00000000 00000000 e242de2c c01467a8 00000000 c1000000 00000000
> 22747067
> Call Trace:
> [<c0102ee8>] ? xen_release_ptpage+0x52/0x71
> [<c0102f23>] ? xen_release_pte+0xd/0xf
> [<c0116722>] ? __pte_free_tlb+0x35/0x4b
> [<c01467a8>] ? free_pgd_range+0x377/0x38c
> [<c015a4bf>] ? setup_arg_pages+0x1b8/0x22b
> [<c017a80f>] ? load_elf_binary+0x3f1/0x10a4
> [<c0104ffa>] ? check_events+0x8/0xe
> [<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
> [<c0104937>] ? xen_force_evtchn_callback+0xf/0x14
> [<c0104f71>] ? xen_restore_fl_direct_end+0x0/0x3
> [<c014298a>] ? page_address+0x85/0xa2
> [<c0159c51>] ? put_arg_page+0x8/0xa
> [<c0159e27>] ? copy_strings+0x156/0x160
> [<c0159ea8>] ? search_binary_handler+0x42/0x10a
> [<c015ad64>] ? do_execve+0x129/0x192
> [<c0105354>] ? sys_execve+0x29/0x50
> [<c0106b1e>] ? syscall_call+0x7/0xb
> [<c0109b2f>] ? kernel_execve+0x17/0x1c
> [<c0102140>] ? run_init_process+0x17/0x19
> [<c01021e3>] ? init_post+0xa1/0xf2
> [<c01077df>] ? kernel_thread_helper+0x7/0x10
>

2008-08-31 23:48:09

by Bernhard Schmidt

[permalink] [raw]
Subject: Re: 2.6.27-rc xen pvops regression?

On Sun, Aug 31, 2008 at 04:01:23PM -0700, Jeremy Fitzhardinge wrote:

> >> I've been a happy user of vanilla 2.6.26 with pvops in my Xen domU for
> >> a couple of months. I wanted to test 2.6.27-rc (-rc3 a couple of weeks
> >> ago, -rc5 just now) to verify another issue and ran into the following
> >> error
> > Thanks Bernhard - I'm able to reproduce it with your config.
> Is this the crash with the UP config?

Ah, good hint, enabling SMP in the domU indeed fixes the problem.

> > (XEN) mm.c:2249:d168 Mfn 24384 not pinned
> Bit hard to know what this means without knowing what mfn 24384 is.

FWIW, I don't have any message in my "xm dmesg".

Bernhard