Hello,
I'm booting some latest kernels on a Fedora 11 (released June 2009)
guest. After the recent change of default to vsyscall=emulate, the
guest fails to boot (init segfaults).
I also tried vsyscall=none, as suggested by hpa, and that fails as
well. Only vsyscall=native works fine.
The commit that introduced the kernel parameter,
3ae36655b97a03fa1decf72f04078ef945647c1a
is bad too.
The host system is RHEL6, if that matters.
Please let me know if you need any more information or testing.
Amit
On Fri, Feb 3, 2012 at 12:27 AM, Amit Shah <[email protected]> wrote:
> Hello,
>
> I'm booting some latest kernels on a Fedora 11 (released June 2009)
> guest. ?After the recent change of default to vsyscall=emulate, the
> guest fails to boot (init segfaults).
Which kernel is the host running and which kernel is the guest
running? And which kernel has the vsyscall=emulate parameter? If
vsyscall=emulate is a problem on a pre-3.3 kernel, can you try
something containing commit 4fc3490114bb159bd4fff1b3c96f4320fe6fb08f?
(UML, for example, is known to have serious issues without that fix.)
Otherwise, can you tell me what hypervisor you're using and what init
version (i.e. the rpm) so I can try to reproduce it? A pointer to an
actual image would work, too. A copy of the oops would also be nice.
--Andy
Hi,
On (Fri) 03 Feb 2012 [07:30:55], Andy Lutomirski wrote:
> On Fri, Feb 3, 2012 at 12:27 AM, Amit Shah <[email protected]> wrote:
> > Hello,
> >
> > I'm booting some latest kernels on a Fedora 11 (released June 2009)
> > guest. ?After the recent change of default to vsyscall=emulate, the
> > guest fails to boot (init segfaults).
>
> Which kernel is the host running
Host is a RHEL6 kernel, 2.6.32-217.el6.x86_64
> and which kernel is the guest
> running? And which kernel has the vsyscall=emulate parameter?
The host stays the same; I'm only using the x86-64 F11 guest with
newer kernel versions. I compile the kernels on the host, and use
qemu's -kernel parameter so the guest boots off that kernel.
Only the guest gets the vsyscall= parameters.
> If
> vsyscall=emulate is a problem on a pre-3.3 kernel, can you try
> something containing commit 4fc3490114bb159bd4fff1b3c96f4320fe6fb08f?
> (UML, for example, is known to have serious issues without that fix.)
I've tried all kernels v3.0 to v3.3-git. From the commit that
introduced the vsyscall=emulate parameter, using 'emulate' has failed
to boot this guest. I only noticed it recently when it was made the
default.
> Otherwise, can you tell me what hypervisor you're using
Sorry, I'm using kvm. qemu is also from RHEL6,
qemu-kvm-0.12.1.2-2.209.el6.x86_64, but even upstream qemu.git makes
init fail similarly.
> and what init
> version (i.e. the rpm) so I can try to reproduce it?
upstart-0.6.5-10.el6.x86_64
> A pointer to an
> actual image would work, too.
It's mostly a stock F11 install, so fetching the iso and installing it
locally, and using a command line similar to:
qemu-kvm -snapshot -kernel ~/src/linux/arch/x86/boot/bzImage
/guests/f11-auto.qcow2 -serial stdio -append 'console=tty0
console=ttyS0 root=/dev/sda2 vsyscall=emulate'
will work.
> A copy of the oops would also be nice.
There's not much, but here it is anyway:
EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
debug: unmapping init memory ffffffff8167f000..ffffffff818e1000
Write protecting the kernel read-only data: 6144k
debug: unmapping init memory ffff8800012fe000..ffff880001400000
debug: unmapping init memory ffff880001584000..ffff880001600000
init[1]: segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff103d72f8 error 5
Kernel panic - not syncing: Attempted to kill init!
That's with current git snapshot. With the commit that introduced
vsyscal= I had gotten this:
EXT4-fs (sda2): couldn't mount as ext3 due to feature incompatibilities
EXT4-fs (sda2): couldn't mount as ext2 due to feature incompatibilities
EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
debug: unmapping init memory ffffffff81679000..ffffffff818db000
Write protecting the kernel read-only data: 6144k
debug: unmapping init memory ffff8800012e6000..ffff880001400000
debug: unmapping init memory ffff880001579000..ffff880001600000
init[1]: segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff9c8ba098 error 5
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 3.0.0+ #189
Call Trace:
[<ffffffff812de4e9>] panic+0x9b/0x1a2
[<ffffffff8102ba99>] ? get_parent_ip+0x11/0x41
[<ffffffff81039ff6>] do_exit+0xb0/0x6f0
[<ffffffff8103a6bf>] do_group_exit+0x89/0xb7
[<ffffffff81048ffa>] get_signal_to_deliver+0x419/0x438
[<ffffffff810016dd>] do_signal+0x72/0x5e4
[<ffffffff8101e827>] ? do_page_fault+0x177/0x338
[<ffffffff812de631>] ? printk+0x41/0x48
[<ffffffff810dd93d>] ? discard_slab+0x3e/0x40
[<ffffffff810dea12>] ? __slab_free+0x13a/0x145
[<ffffffff810ef4fd>] ? putname+0x32/0x3b
[<ffffffff810ef4fd>] ? putname+0x32/0x3b
[<ffffffff810df179>] ? kmem_cache_free+0x7d/0xce
[<ffffffff812e1adf>] ? retint_signal+0x11/0x92
[<ffffffff81001c69>] do_notify_resume+0x1a/0x37
[<ffffffff812e1b1b>] retint_signal+0x4d/0x92
Amit
On (Fri) 03 Feb 2012 [13:57:48], Amit Shah wrote:
> Hello,
>
> I'm booting some latest kernels on a Fedora 11 (released June 2009)
> guest. After the recent change of default to vsyscall=emulate, the
> guest fails to boot (init segfaults).
>
> I also tried vsyscall=none, as suggested by hpa, and that fails as
> well. Only vsyscall=native works fine.
>
> The commit that introduced the kernel parameter,
>
> 3ae36655b97a03fa1decf72f04078ef945647c1a
>
> is bad too.
I suggest we revert 2e57ae0515124af45dd889bfbd4840fd40fcc07d till we
track down and fix the vsyscal=emulate case.
Amit
On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah <[email protected]> wrote:
> On (Fri) 03 Feb 2012 [13:57:48], Amit Shah wrote:
>> Hello,
>>
>> I'm booting some latest kernels on a Fedora 11 (released June 2009)
>> guest. ?After the recent change of default to vsyscall=emulate, the
>> guest fails to boot (init segfaults).
>>
>> I also tried vsyscall=none, as suggested by hpa, and that fails as
>> well. ?Only vsyscall=native works fine.
>>
>> The commit that introduced the kernel parameter,
>>
>> 3ae36655b97a03fa1decf72f04078ef945647c1a
>>
>> is bad too.
>
> I suggest we revert 2e57ae0515124af45dd889bfbd4840fd40fcc07d till we
> track down and fix the vsyscal=emulate case.
Hi-
Sorry, I lost track of this one. I can't reproduce it, although I
doubt I've set up the right test environment. But this is fishy:
init[1]: segfault at ffffffffff600400 ip ffffffffff600400 sp
00007fff9c8ba098 error 5
Error 5, if I'm decoding it correctly, is a userspace read (i.e. not
execute) fault. The vsyscall emulation changes shouldn't have had any
effect on reads there.
Can you try booting the initramfs here:
http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
with your kernel image (i.e. qemu-kvm -kernel <whatever> -initrd
vsyscall_initramfs.img -whatever_else) and seeing what happens? It
works for me. That image is just a modern static build (i.e. built on
F16) of this code:
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
typedef time_t (*vsys_time_t)(time_t *);
int main()
{
vsys_time_t vsys_time = (vsys_time_t)(0xffffffffff600400);
unsigned char *p = (char*)0xffffffffff600400;
int i;
printf("The time is %ld\n", (long)( vsys_time(0) ));
printf("The first few bytes are:\n");
for (i = 0; i < 16; i++) {
unsigned char c = p[i];
printf("%02x ", (int)c);
}
printf("\n");
printf("All done\n");
while(1)
pause();
}
I'm also curious what happens if you run without kvm (i.e. straight
qemu) and what your .config on the guest kernel is. It sounds like
something's wrong with your fixmap, which makes me wonder if your
qemu/kernel combo is capable of booting even a modern distro
(up-to-date F16, say) -- the vvar page uses identical fixmap flags as
the vsyscall page in vsyscall=emulate and vsyscall=none mode.
What host cpu are you on and what qemu flags do you use? Maybe
something is wrong with your emulator.
--Andy
On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote:
> On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah <[email protected]> wrote:
> > On (Fri) 03 Feb 2012 [13:57:48], Amit Shah wrote:
> >> Hello,
> >>
> >> I'm booting some latest kernels on a Fedora 11 (released June 2009)
> >> guest. ?After the recent change of default to vsyscall=emulate, the
> >> guest fails to boot (init segfaults).
> >>
> >> I also tried vsyscall=none, as suggested by hpa, and that fails as
> >> well. ?Only vsyscall=native works fine.
> >>
> >> The commit that introduced the kernel parameter,
> >>
> >> 3ae36655b97a03fa1decf72f04078ef945647c1a
> >>
> >> is bad too.
> >
> > I suggest we revert 2e57ae0515124af45dd889bfbd4840fd40fcc07d till we
> > track down and fix the vsyscal=emulate case.
>
> Hi-
>
> Sorry, I lost track of this one. I can't reproduce it, although I
> doubt I've set up the right test environment. But this is fishy:
>
> init[1]: segfault at ffffffffff600400 ip ffffffffff600400 sp
> 00007fff9c8ba098 error 5
>
> Error 5, if I'm decoding it correctly, is a userspace read (i.e. not
> execute) fault. The vsyscall emulation changes shouldn't have had any
> effect on reads there.
>
> Can you try booting the initramfs here:
> http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
> with your kernel image (i.e. qemu-kvm -kernel <whatever> -initrd
> vsyscall_initramfs.img -whatever_else) and seeing what happens? It
> works for me.
This too results in a similar error.
> I'm also curious what happens if you run without kvm (i.e. straight
> qemu)
Interesting; without kvm, this does work fine.
> and what your .config on the guest kernel is. It sounds like
> something's wrong with your fixmap, which makes me wonder if your
> qemu/kernel combo is capable of booting even a modern distro
> (up-to-date F16, say) -- the vvar page uses identical fixmap flags as
> the vsyscall page in vsyscall=emulate and vsyscall=none mode.
I didn't try a modern distro, but looks like this is enough evidence
for now to check the kvm emulator code. I tried the same guests on a
newer kernel (Fedora 16's 3.2), and things worked fine except for
vsyscall=none, panic message below.
> What host cpu are you on and what qemu flags do you use?
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz
stepping : 11
cpu MHz : 2000.000
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority
bogomips : 4654.73
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
> Maybe
> something is wrong with your emulator.
Yes, looks like it. Thanks!
This is what I get with vsyscall=none, where emulate and native work
fine on the 3.2 kernel on different host hardware, the guest stays the
same:
[ 2.874661] debug: unmapping init memory ffffffff8167f000..ffffffff818dc000
[ 2.876778] Write protecting the kernel read-only data: 6144k
[ 2.879111] debug: unmapping init memory ffff880001318000..ffff880001400000
[ 2.881242] debug: unmapping init memory ffff8800015a0000..ffff880001600000
[ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0
[ 2.888078] init[1]: segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff2f48fe18 error 15
[ 2.888193] Refined TSC clocksource calibration: 2691.293 MHz.
[ 2.892748]
[ 2.895219] Kernel panic - not syncing: Attempted to kill init!
Amit
Hi, kvm people-
Here's a strange failure. It could be a bug in something
RHEL6-specific, but it could be a generic issue that only triggers
with a paravirt guest with old userspace on a non-ept host. There was
a bug like this on Xen, and I'm wondering something's wrong on kvm as
well.
For background, a change in 3.1 (IIRC) means that, when
vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
NX. It seems like Amit's machine is marking the physical PTE present
but unreadable. So I could have messed up, or there could be a subtle
bug somewhere. Any ideas?
I'll try to reproduce on a non-ept host later on, but that will
involve finding one.
On Wed, Feb 15, 2012 at 3:01 AM, Amit Shah <[email protected]> wrote:
> On (Tue) 14 Feb 2012 [08:26:22], Andy Lutomirski wrote:
>> On Tue, Feb 14, 2012 at 4:22 AM, Amit Shah <[email protected]> wrote:
>> Can you try booting the initramfs here:
>> http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
>> with your kernel image (i.e. qemu-kvm -kernel <whatever> -initrd
>> vsyscall_initramfs.img -whatever_else) and seeing what happens? ?It
>> works for me.
>
> This too results in a similar error.
Can you post the exact error? I'm interested in how far it gets
before it fails.
> I didn't try a modern distro, but looks like this is enough evidence
> for now to check the kvm emulator code. ?I tried the same guests on a
> newer kernel (Fedora 16's 3.2), and things worked fine except for
> vsyscall=none, panic message below.
vsyscall=none isn't supposed to work unless you're running a very
modern distro *and* you have no legacy static binaries *and* you
aren't using anything written in Go (sigh). It will probably either
never become the default or will take 5-10 years.
> model name ? ? ?: Intel(R) Core(TM)2 Duo CPU ? ? E6550 ?@ 2.33GHz
> flags ? ? ? ? ? : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dts tpr_shadow vnmi flexpriority
Hmm. You don't have ept. If your guest kernel supports paravirt,
then you might use the hypercall interface instead of programming the
fixmap directly.
>
> This is what I get with vsyscall=none, where emulate and native work
> fine on the 3.2 kernel on different host hardware, the guest stays the
> same:
>
>
> [ ? ?2.874661] debug: unmapping init memory ffffffff8167f000..ffffffff818dc000
> [ ? ?2.876778] Write protecting the kernel read-only data: 6144k
> [ ? ?2.879111] debug: unmapping init memory ffff880001318000..ffff880001400000
> [ ? ?2.881242] debug: unmapping init memory ffff8800015a0000..ffff880001600000
> [ ? ?2.884637] init[1] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0
This like (vsyscall attempted) means that the emulation worked
correctly. Your other traces didn't have it or anything like it,
which mostly rules out do_emulate_vsyscall issues.
--Andy
On 02/15/2012 09:36 PM, Andy Lutomirski wrote:
> Hi, kvm people-
>
> Here's a strange failure. It could be a bug in something
> RHEL6-specific, but it could be a generic issue that only triggers
> with a paravirt guest with old userspace on a non-ept host. There was
> a bug like this on Xen, and I'm wondering something's wrong on kvm as
> well.
>
> For background, a change in 3.1 (IIRC) means that, when
> vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
> NX. It seems like Amit's machine is marking the physical PTE present
> but unreadable.
No such thing as present and unreadable, without EPT.
> So I could have messed up, or there could be a subtle
> bug somewhere. Any ideas?
What's the code trying to do? Execute an instruction from an
non-executable page, trap the #PF, and emulate? And what are the
symptoms? wrong error code for the #PF? That could easily be a kvm bug.
>
> I'll try to reproduce on a non-ept host later on, but that will
> involve finding one.
rmmod kvm-intel
moprobe kvm-intel ept=0
> Hmm. You don't have ept. If your guest kernel supports paravirt,
> then you might use the hypercall interface instead of programming the
> fixmap directly.
There is no hypercall interface for writing page tables in kvm.
>
> >
> > This is what I get with vsyscall=none, where emulate and native work
> > fine on the 3.2 kernel on different host hardware, the guest stays the
> > same:
> >
> >
> > [ 2.874661] debug: unmapping init memory ffffffff8167f000..ffffffff818dc000
> > [ 2.876778] Write protecting the kernel read-only data: 6144k
> > [ 2.879111] debug: unmapping init memory ffff880001318000..ffff880001400000
> > [ 2.881242] debug: unmapping init memory ffff8800015a0000..ffff880001600000
> > [ 2.884637] init[1] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0
>
> This like (vsyscall attempted) means that the emulation worked
> correctly. Your other traces didn't have it or anything like it,
> which mostly rules out do_emulate_vsyscall issues.
>
Can you point me at the code in question?
Amit, a trace would be nice.
--
error compiling committee.c: too many arguments to function
On Thu, Feb 16, 2012 at 8:17 AM, Avi Kivity <[email protected]> wrote:
> On 02/15/2012 09:36 PM, Andy Lutomirski wrote:
>> Hi, kvm people-
>>
>> Here's a strange failure. ?It could be a bug in something
>> RHEL6-specific, but it could be a generic issue that only triggers
>> with a paravirt guest with old userspace on a non-ept host. ?There was
>> a bug like this on Xen, and I'm wondering something's wrong on kvm as
>> well.
>>
>> For background, a change in 3.1 (IIRC) means that, when
>> vsyscall=emulate or vsyscall=none, the vsyscall page in the fixmap is
>> NX. ?It seems like Amit's machine is marking the physical PTE present
>> but unreadable.
>
> No such thing as present and unreadable, without EPT.
>
>> So I could have messed up, or there could be a subtle
>> bug somewhere. ?Any ideas?
>
> What's the code trying to do? ?Execute an instruction from an
> non-executable page, trap the #PF, and emulate? ?And what are the
> symptoms? wrong error code for the #PF? ?That could easily be a kvm bug.
>
The symptom is that some kind of access to a page that's supposed to
be readable, NX is reporting error 5. I'm not quite sure what kind of
access is causing that.
>>
>> I'll try to reproduce on a non-ept host later on, but that will
>> involve finding one.
>
> rmmod kvm-intel
> moprobe kvm-intel ept=0
I just tried that and still can't reproduce the problem. FWIW, I also
failed to reproduce it on the one RHEL6 machine I have access to.
>
>> Hmm. ?You don't have ept. ?If your guest kernel supports paravirt,
>> then you might use the hypercall interface instead of programming the
>> fixmap directly.
>
> There is no hypercall interface for writing page tables in kvm.
Evidently I was looking at the removed kvm_set_pte stuff :)
>
>>
>> >
>> > This is what I get with vsyscall=none, where emulate and native work
>> > fine on the 3.2 kernel on different host hardware, the guest stays the
>> > same:
>> >
>> >
>> > [ ? ?2.874661] debug: unmapping init memory ffffffff8167f000..ffffffff818dc000
>> > [ ? ?2.876778] Write protecting the kernel read-only data: 6144k
>> > [ ? ?2.879111] debug: unmapping init memory ffff880001318000..ffff880001400000
>> > [ ? ?2.881242] debug: unmapping init memory ffff8800015a0000..ffff880001600000
>> > [ ? ?2.884637] init[1] vsyscall attempted with vsyscall=none ip:ffffffffff600400 cs:33 sp:7fff2f48fe18 ax:7fff2f48fe50 si:7fff2f48ff08 di:0
>>
>> This like (vsyscall attempted) means that the emulation worked
>> correctly. ?Your other traces didn't have it or anything like it,
>> which mostly rules out do_emulate_vsyscall issues.
>>
>
> Can you point me at the code in question?
The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
The bad access is to the vsyscall page.
>
> Amit, a trace would be nice.
The full output from a test boot of my (updated this morning) initramfs here:
http://web.mit.edu/luto/www/linux/vsyscall_initramfs.img
may give a better hint.
The updated code is here:
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
typedef time_t (*vsys_time_t)(time_t *);
int main()
{
vsys_time_t vsys_time = (vsys_time_t)(0xffffffffff600400);
unsigned char *p = (char*)0xffffffffff600400;
int i;
printf("Will try reading...\n");
printf("The first few bytes are:\n");
for (i = 0; i < 16; i++) {
unsigned char c = p[i];
printf("%02x ", (int)c);
}
printf("\n");
printf("Will try executing...\n");
printf("The time is %ld\n", (long)( vsys_time(0) ));
printf("All done\n");
while(1)
pause();
}
--Andy
On 02/16/2012 06:45 PM, Andy Lutomirski wrote:
> >
> >> So I could have messed up, or there could be a subtle
> >> bug somewhere. Any ideas?
> >
> > What's the code trying to do? Execute an instruction from an
> > non-executable page, trap the #PF, and emulate? And what are the
> > symptoms? wrong error code for the #PF? That could easily be a kvm bug.
> >
>
> The symptom is that some kind of access to a page that's supposed to
> be readable, NX is reporting error 5. I'm not quite sure what kind of
> access is causing that.
Might it be a fetch access, with kvm forgetting to set bit 4 correctly?
> >
> > Can you point me at the code in question?
>
> The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
> The bad access is to the vsyscall page.
The bad access is on purpose, yes?
>From fault.c:
#ifdef CONFIG_X86_64
/*
* Instruction fetch faults in the vsyscall page might need
* emulation.
*/
if (unlikely((error_code & PF_INSTR) &&
((address & ~0xfff) == VSYSCALL_START))) {
if (emulate_vsyscall(regs, address))
return;
}
#endif
so it seems like kvm doesn't set PF_INSTR?
I thought we unit tested that, but maybe not this exact scenario.
--
error compiling committee.c: too many arguments to function
On Thu, Feb 16, 2012 at 9:14 AM, Avi Kivity <[email protected]> wrote:
> On 02/16/2012 06:45 PM, Andy Lutomirski wrote:
>> >
>> >> So I could have messed up, or there could be a subtle
>> >> bug somewhere. ?Any ideas?
>> >
>> > What's the code trying to do? ?Execute an instruction from an
>> > non-executable page, trap the #PF, and emulate? ?And what are the
>> > symptoms? wrong error code for the #PF? ?That could easily be a kvm bug.
>> >
>>
>> The symptom is that some kind of access to a page that's supposed to
>> be readable, NX is reporting error 5. ?I'm not quite sure what kind of
>> access is causing that.
>
> Might it be a fetch access, with kvm forgetting to set bit 4 correctly?
>
>> >
>> > Can you point me at the code in question?
>>
>> The setup code is in arch/x86/kernel/vsyscall_64.c in map_vsyscall.
>> The bad access is to the vsyscall page.
>
> The bad access is on purpose, yes?
>
> From fault.c:
>
> #ifdef CONFIG_X86_64
> ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? * Instruction fetch faults in the vsyscall page might need
> ? ? ? ? ? ? ? ? * emulation.
> ? ? ? ? ? ? ? ? */
> ? ? ? ? ? ? ? ?if (unlikely((error_code & PF_INSTR) &&
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ((address & ~0xfff) == VSYSCALL_START))) {
> ? ? ? ? ? ? ? ? ? ? ? ?if (emulate_vsyscall(regs, address))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return;
> ? ? ? ? ? ? ? ?}
> #endif
>
> so it seems like kvm doesn't set PF_INSTR?
Yes, this is on purpose, and you're almost certainly right (and I feel
dumb for not figuring this out immediately). The error message is:
segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff103d72f8 error 5
which is garbage. The instruction at 0xffffffffff600400 can't fetch
itself as data and fault on the data access (at least not in 64-bit
mode, as far as I can think of, without evil messing with the TLBs).
So... what do we do about this? This (whitespace-damaged, untested)
patch will probably work around it well enough to boot the system:
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 9d74824..52b9522 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long
* Instruction fetch faults in the vsyscall page might need
* emulation.
*/
- if (unlikely((error_code & PF_INSTR) &&
+ if (unlikely(address == regs->ip && !(error_code & PF_WRITE) &&
((address & ~0xfff) == VSYSCALL_START))) {
+ WARN_ONCE(!(error_code & PF_INSTR),
+ "Fixing up bogus vsyscall read fault -- "
+ "your hypervisor is buggy.");
if (emulate_vsyscall(regs, address))
return;
}
Before we patch the guest like this, though, it would be nice to know
what hosts are affected. If it's just one version of RHEL6, maybe it
makes sense to fix the hypervisor and either leave the guest alone or
just add a warning saying to fix your hypervisor, like:
WARN_ONCE(address == regs->ip && !(error_code & (PF_INSTR | PF_WRITE))
&& user_64bit_mode(regs), "Fishy page fault -- you might need to fix
your hypervisor");
near some exit path in the page fault handler. The 64-bit check is
because (I think) 32-bit code can mess with regs->ip using a cs offset
in the LDT and trigger the warning at will.
--Andy
On 02/16/2012 07:35 PM, Andy Lutomirski wrote:
> >
> > so it seems like kvm doesn't set PF_INSTR?
>
> Yes, this is on purpose, and you're almost certainly right (and I feel
> dumb for not figuring this out immediately). The error message is:
>
> segfault at ffffffffff600400 ip ffffffffff600400 sp 00007fff103d72f8 error 5
>
> which is garbage. The instruction at 0xffffffffff600400 can't fetch
> itself as data and fault on the data access (at least not in 64-bit
> mode, as far as I can think of, without evil messing with the TLBs).
>
> So... what do we do about this? This (whitespace-damaged, untested)
> patch will probably work around it well enough to boot the system:
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 9d74824..52b9522 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -741,8 +741,11 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long
> * Instruction fetch faults in the vsyscall page might need
> * emulation.
> */
> - if (unlikely((error_code & PF_INSTR) &&
> + if (unlikely(address == regs->ip && !(error_code & PF_WRITE) &&
> ((address & ~0xfff) == VSYSCALL_START))) {
> + WARN_ONCE(!(error_code & PF_INSTR),
> + "Fixing up bogus vsyscall read fault -- "
> + "your hypervisor is buggy.");
> if (emulate_vsyscall(regs, address))
> return;
> }
>
> Before we patch the guest like this, though, it would be nice to know
> what hosts are affected. If it's just one version of RHEL6, maybe it
> makes sense to fix the hypervisor and either leave the guest alone or
> just add a warning saying to fix your hypervisor, like:
>
> WARN_ONCE(address == regs->ip && !(error_code & (PF_INSTR | PF_WRITE))
> && user_64bit_mode(regs), "Fishy page fault -- you might need to fix
> your hypervisor");
>
> near some exit path in the page fault handler. The 64-bit check is
> because (I think) 32-bit code can mess with regs->ip using a cs offset
> in the LDT and trigger the warning at will.
>
We'll just fix all affected hypervisor versions. No need to uglify the
guest for a clear kvm bug.
--
error compiling committee.c: too many arguments to function
On 02/16/2012 09:39 AM, Avi Kivity wrote:
>>
>> Yes, this is on purpose
Why?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin <[email protected]> wrote:
> On 02/16/2012 09:39 AM, Avi Kivity wrote:
>>>
>>> Yes, this is on purpose
>
> Why?
I think the "this" refers to the PF_INSTR fault when executing at
0xffffffffff600xxx. That's definitely intentional -- it's how
vsyscall emulation works.
I think it's unintentional that some kvm versions apparently forget to
set the PF_INSTR bit.
--Andy
>
> ? ? ? ?-hpa
>
>
> --
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel. ?I don't speak on their behalf.
>
--
Andy Lutomirski
AMA Capital Management, LLC
Office: (310) 553-5322
Mobile: (650) 906-0647
On 02/24/2012 08:58 PM, Andy Lutomirski wrote:
> On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin <[email protected]> wrote:
> > On 02/16/2012 09:39 AM, Avi Kivity wrote:
> >>>
> >>> Yes, this is on purpose
> >
> > Why?
>
> I think the "this" refers to the PF_INSTR fault when executing at
> 0xffffffffff600xxx. That's definitely intentional -- it's how
> vsyscall emulation works.
>
> I think it's unintentional that some kvm versions apparently forget to
> set the PF_INSTR bit.
>
Correct. Can you provide the version that failed, so we can fix it?
--
error compiling committee.c: too many arguments to function
On (Tue) 28 Feb 2012 [12:00:34], Avi Kivity wrote:
> On 02/24/2012 08:58 PM, Andy Lutomirski wrote:
> > On Thu, Feb 23, 2012 at 8:34 PM, H. Peter Anvin <[email protected]> wrote:
> > > On 02/16/2012 09:39 AM, Avi Kivity wrote:
> > >>>
> > >>> Yes, this is on purpose
> > >
> > > Why?
> >
> > I think the "this" refers to the PF_INSTR fault when executing at
> > 0xffffffffff600xxx. That's definitely intentional -- it's how
> > vsyscall emulation works.
> >
> > I think it's unintentional that some kvm versions apparently forget to
> > set the PF_INSTR bit.
> >
>
> Correct. Can you provide the version that failed, so we can fix it?
I'm running this on a RHEL host, the version that fails is
2.6.32-220.4.1.el6.x86_64, but can't say when it got introduced,
haven't gone back and checked that.
Amit