2013-10-03 08:24:46

by Frediano Ziglio

[permalink] [raw]
Subject: [PATCH] xen: Fix possible user space selector corruption

Due to the way kernel is initialized under Xen is possible that the ring1
selector used by the kernel for the boot cpu end up to be copied to
userspace leading to segmentation fault in the userspace.


Xen code in the kernel initialize no-boot cpus with correct selectors (ds
and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
On task context switch (switch_to) we assume that ds, es and cs already
point to __USER_DS and __KERNEL_CSso these selector are not changed.

If processor is an Intel that support sysenter instruction sysenter/sysexit
is used so ds and es are not restored switching back from kernel to
userspace. In the case the selectors point to a ring1 instead of __USER_DS
the userspace code will crash on first memory access attempt (to be
precise Xen on the emulated iret used to do sysexit will detect and set ds
and es to zero which lead to GPF anyway).

Now if an userspace process call kernel using sysenter and get rescheduled
(for me it happen on a specific init calling wait4) could happen that the
ring1 selector is set to ds and es.

This is quite hard to detect cause after a while these selectors are fixed
(__USER_DS seems sticky).

Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
to be the first one that have this issue.

Signed-off-by: Frediano Ziglio <[email protected]>
---
arch/x86/xen/smp.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index d1e4777..2a47241 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -278,6 +278,18 @@ static void __init xen_smp_prepare_boot_cpu(void)
old memory can be recycled */
make_lowmem_page_readwrite(xen_initial_gdt);

+#ifdef CONFIG_X86_32
+ /*
+ * Assure we use segments with user level access.
+ * During switching of task these segments got not reloaded
+ * so it could happen that userspace tasks get Xen ring1
+ * selector causing exit with sysenter failures on next
+ * userspace memory operation.
+ */
+ loadsegment(ds, __USER_DS);
+ loadsegment(es, __USER_DS);
+#endif
+
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
}
--
1.7.10.4


2013-10-03 09:47:50

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On 03/10/13 09:24, Frediano Ziglio wrote:
> Due to the way kernel is initialized under Xen is possible that the ring1
> selector used by the kernel for the boot cpu end up to be copied to
> userspace leading to segmentation fault in the userspace.
>
>
> Xen code in the kernel initialize no-boot cpus with correct selectors (ds
> and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
> On task context switch (switch_to) we assume that ds, es and cs already
> point to __USER_DS and __KERNEL_CSso these selector are not changed.
>
> If processor is an Intel that support sysenter instruction sysenter/sysexit
> is used so ds and es are not restored switching back from kernel to
> userspace. In the case the selectors point to a ring1 instead of __USER_DS
> the userspace code will crash on first memory access attempt (to be
> precise Xen on the emulated iret used to do sysexit will detect and set ds
> and es to zero which lead to GPF anyway).
>
> Now if an userspace process call kernel using sysenter and get rescheduled
> (for me it happen on a specific init calling wait4) could happen that the
> ring1 selector is set to ds and es.
>
> This is quite hard to detect cause after a while these selectors are fixed
> (__USER_DS seems sticky).
>
> Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
> to be the first one that have this issue.
>
> Signed-off-by: Frediano Ziglio <[email protected]>

Looks okay to me but I don't really understand it...

David

2013-10-03 10:04:37

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On 03/10/13 09:24, Frediano Ziglio wrote:
> Due to the way kernel is initialized under Xen is possible that the ring1
> selector used by the kernel for the boot cpu end up to be copied to
> userspace leading to segmentation fault in the userspace.
>
>
> Xen code in the kernel initialize no-boot cpus with correct selectors (ds
> and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
> On task context switch (switch_to) we assume that ds, es and cs already
> point to __USER_DS and __KERNEL_CSso these selector are not changed.
>
> If processor is an Intel that support sysenter instruction sysenter/sysexit
> is used so ds and es are not restored switching back from kernel to
> userspace. In the case the selectors point to a ring1 instead of __USER_DS
> the userspace code will crash on first memory access attempt (to be
> precise Xen on the emulated iret used to do sysexit will detect and set ds
> and es to zero which lead to GPF anyway).
>
> Now if an userspace process call kernel using sysenter and get rescheduled
> (for me it happen on a specific init calling wait4) could happen that the
> ring1 selector is set to ds and es.
>
> This is quite hard to detect cause after a while these selectors are fixed
> (__USER_DS seems sticky).
>
> Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
> to be the first one that have this issue.
>
> Signed-off-by: Frediano Ziglio <[email protected]>

In terms of the correctness of the fix,

Reviewed-by: Andrew Cooper <[email protected]>

However, I am not sure the comment is necessary. The prevailing style
is for no justification of loads of segment selectors on boot, and the
comment itself refers simply to an interaction issue of 32bit on Xen
when making use of sysenter.

> ---
> arch/x86/xen/smp.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index d1e4777..2a47241 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -278,6 +278,18 @@ static void __init xen_smp_prepare_boot_cpu(void)
> old memory can be recycled */
> make_lowmem_page_readwrite(xen_initial_gdt);
>
> +#ifdef CONFIG_X86_32
> + /*
> + * Assure we use segments with user level access.
> + * During switching of task these segments got not reloaded
> + * so it could happen that userspace tasks get Xen ring1
> + * selector causing exit with sysenter failures on next
> + * userspace memory operation.
> + */
> + loadsegment(ds, __USER_DS);
> + loadsegment(es, __USER_DS);
> +#endif
> +
> xen_filter_cpu_maps();
> xen_setup_vcpu_info_placement();
> }

2013-10-03 13:08:04

by Frediano Ziglio

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On Thu, 2013-10-03 at 11:04 +0100, Andrew Cooper wrote:
> On 03/10/13 09:24, Frediano Ziglio wrote:
> > Due to the way kernel is initialized under Xen is possible that the ring1
> > selector used by the kernel for the boot cpu end up to be copied to
> > userspace leading to segmentation fault in the userspace.
> >
> >
> > Xen code in the kernel initialize no-boot cpus with correct selectors (ds
> > and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
> > On task context switch (switch_to) we assume that ds, es and cs already
> > point to __USER_DS and __KERNEL_CSso these selector are not changed.
> >
> > If processor is an Intel that support sysenter instruction sysenter/sysexit
> > is used so ds and es are not restored switching back from kernel to
> > userspace. In the case the selectors point to a ring1 instead of __USER_DS
> > the userspace code will crash on first memory access attempt (to be
> > precise Xen on the emulated iret used to do sysexit will detect and set ds
> > and es to zero which lead to GPF anyway).
> >
> > Now if an userspace process call kernel using sysenter and get rescheduled
> > (for me it happen on a specific init calling wait4) could happen that the
> > ring1 selector is set to ds and es.
> >
> > This is quite hard to detect cause after a while these selectors are fixed
> > (__USER_DS seems sticky).
> >
> > Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
> > to be the first one that have this issue.
> >
> > Signed-off-by: Frediano Ziglio <[email protected]>
>
> In terms of the correctness of the fix,
>
> Reviewed-by: Andrew Cooper <[email protected]>
>
> However, I am not sure the comment is necessary. The prevailing style
> is for no justification of loads of segment selectors on boot, and the
> comment itself refers simply to an interaction issue of 32bit on Xen
> when making use of sysenter.
>

Suggestion for the comment ??

Frediano

> > ---
> > arch/x86/xen/smp.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > index d1e4777..2a47241 100644
> > --- a/arch/x86/xen/smp.c
> > +++ b/arch/x86/xen/smp.c
> > @@ -278,6 +278,18 @@ static void __init xen_smp_prepare_boot_cpu(void)
> > old memory can be recycled */
> > make_lowmem_page_readwrite(xen_initial_gdt);
> >
> > +#ifdef CONFIG_X86_32
> > + /*
> > + * Assure we use segments with user level access.
> > + * During switching of task these segments got not reloaded
> > + * so it could happen that userspace tasks get Xen ring1
> > + * selector causing exit with sysenter failures on next
> > + * userspace memory operation.
> > + */
> > + loadsegment(ds, __USER_DS);
> > + loadsegment(es, __USER_DS);
> > +#endif
> > +
> > xen_filter_cpu_maps();
> > xen_setup_vcpu_info_placement();
> > }
>

2013-10-04 13:20:44

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On Thu, Oct 03, 2013 at 01:51:32PM +0100, Frediano Ziglio wrote:
> On Thu, 2013-10-03 at 11:04 +0100, Andrew Cooper wrote:
> > On 03/10/13 09:24, Frediano Ziglio wrote:
> > > Due to the way kernel is initialized under Xen is possible that the ring1
> > > selector used by the kernel for the boot cpu end up to be copied to
> > > userspace leading to segmentation fault in the userspace.
> > >
> > >
> > > Xen code in the kernel initialize no-boot cpus with correct selectors (ds
> > > and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
> > > On task context switch (switch_to) we assume that ds, es and cs already
> > > point to __USER_DS and __KERNEL_CSso these selector are not changed.
> > >
> > > If processor is an Intel that support sysenter instruction sysenter/sysexit
> > > is used so ds and es are not restored switching back from kernel to
> > > userspace. In the case the selectors point to a ring1 instead of __USER_DS
> > > the userspace code will crash on first memory access attempt (to be
> > > precise Xen on the emulated iret used to do sysexit will detect and set ds
> > > and es to zero which lead to GPF anyway).
> > >
> > > Now if an userspace process call kernel using sysenter and get rescheduled
> > > (for me it happen on a specific init calling wait4) could happen that the
> > > ring1 selector is set to ds and es.
> > >
> > > This is quite hard to detect cause after a while these selectors are fixed
> > > (__USER_DS seems sticky).
> > >
> > > Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
> > > to be the first one that have this issue.
> > >
> > > Signed-off-by: Frediano Ziglio <[email protected]>
> >
> > In terms of the correctness of the fix,
> >
> > Reviewed-by: Andrew Cooper <[email protected]>

Should this also go in stable tree?
> >
> > However, I am not sure the comment is necessary. The prevailing style
> > is for no justification of loads of segment selectors on boot, and the
> > comment itself refers simply to an interaction issue of 32bit on Xen
> > when making use of sysenter.
> >
>
> Suggestion for the comment ??
>
> Frediano
>
> > > ---
> > > arch/x86/xen/smp.c | 12 ++++++++++++
> > > 1 file changed, 12 insertions(+)
> > >
> > > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> > > index d1e4777..2a47241 100644
> > > --- a/arch/x86/xen/smp.c
> > > +++ b/arch/x86/xen/smp.c
> > > @@ -278,6 +278,18 @@ static void __init xen_smp_prepare_boot_cpu(void)
> > > old memory can be recycled */
> > > make_lowmem_page_readwrite(xen_initial_gdt);
> > >
> > > +#ifdef CONFIG_X86_32
> > > + /*
> > > + * Assure we use segments with user level access.
> > > + * During switching of task these segments got not reloaded
> > > + * so it could happen that userspace tasks get Xen ring1
> > > + * selector causing exit with sysenter failures on next
> > > + * userspace memory operation.
> > > + */
> > > + loadsegment(ds, __USER_DS);
> > > + loadsegment(es, __USER_DS);
> > > +#endif
> > > +
> > > xen_filter_cpu_maps();
> > > xen_setup_vcpu_info_placement();
> > > }
> >
>
>

2013-10-04 13:51:13

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On 04/10/13 14:20, Konrad Rzeszutek Wilk wrote:
> On Thu, Oct 03, 2013 at 01:51:32PM +0100, Frediano Ziglio wrote:
>> On Thu, 2013-10-03 at 11:04 +0100, Andrew Cooper wrote:
>>> On 03/10/13 09:24, Frediano Ziglio wrote:
>>>>
>>>> Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
>>>> to be the first one that have this issue.
>>>>
>>>> Signed-off-by: Frediano Ziglio <[email protected]>
>>> In terms of the correctness of the fix,
>>>
>>> Reviewed-by: Andrew Cooper <[email protected]>
> Should this also go in stable tree?

Very much so. The change which exposed it for us was from 3.7 iirc, but
I believe it has been a latent bug for as long as the native early boot
code uses __USER_DS.

>>> However, I am not sure the comment is necessary. The prevailing style
>>> is for no justification of loads of segment selectors on boot, and the
>>> comment itself refers simply to an interaction issue of 32bit on Xen
>>> when making use of sysenter.
>>>
>> Suggestion for the comment ??
>>
>> Frediano

My suggestion was to omit the comment entirely, or simplify it to just:

/* Xen starts us with XEN_FLAT_RING1_DS, but linux code expects __USER_DS */

Anyone who wants the full explanation can read the patch description.

~Andrew

2013-10-07 09:39:46

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: Fix possible user space selector corruption

On 04/10/13 14:33, Andrew Cooper wrote:
> On 04/10/13 14:20, Konrad Rzeszutek Wilk wrote:
>> On Thu, Oct 03, 2013 at 01:51:32PM +0100, Frediano Ziglio wrote:
>>> On Thu, 2013-10-03 at 11:04 +0100, Andrew Cooper wrote:
>>>> On 03/10/13 09:24, Frediano Ziglio wrote:
>>>>>
>>>>> Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
>>>>> to be the first one that have this issue.
>>>>>
>>>>> Signed-off-by: Frediano Ziglio <[email protected]>
>>>> In terms of the correctness of the fix,
>>>>
>>>> Reviewed-by: Andrew Cooper <[email protected]>
>> Should this also go in stable tree?
>
> Very much so. The change which exposed it for us was from 3.7 iirc, but
> I believe it has been a latent bug for as long as the native early boot
> code uses __USER_DS.
>
>>>> However, I am not sure the comment is necessary. The prevailing style
>>>> is for no justification of loads of segment selectors on boot, and the
>>>> comment itself refers simply to an interaction issue of 32bit on Xen
>>>> when making use of sysenter.
>>>>
>>> Suggestion for the comment ??
>>>
>>> Frediano
>
> My suggestion was to omit the comment entirely, or simplify it to just:

Suggesting comments should be omitted because other code is poorly
commented seems odd to me.

> /* Xen starts us with XEN_FLAT_RING1_DS, but linux code expects __USER_DS */

I do think this comment better summarizes the reason for loading the
segment registers so I'd prefer this.

David

2013-10-07 09:48:15

by Frediano Ziglio

[permalink] [raw]
Subject: [PATCH v2] xen: Fix possible user space selector corruption

Due to the way kernel is initialized under Xen is possible that the
ring1 selector used by the kernel for the boot cpu end up to be copied
to userspace leading to segmentation fault in the userspace.

Xen code in the kernel initialize no-boot cpus with correct selectors (ds
and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
On task context switch (switch_to) we assume that ds, es and cs already
point to __USER_DS and __KERNEL_CSso these selector are not changed.

If processor is an Intel that support sysenter instruction sysenter/sysexit
is used so ds and es are not restored switching back from kernel to
userspace. In the case the selectors point to a ring1 instead of __USER_DS
the userspace code will crash on first memory access attempt (to be
precise Xen on the emulated iret used to do sysexit will detect and set ds
and es to zero which lead to GPF anyway).

Now if an userspace process call kernel using sysenter and get rescheduled
(for me it happen on a specific init calling wait4) could happen that the
ring1 selector is set to ds and es.

This is quite hard to detect cause after a while these selectors are fixed
(__USER_DS seems sticky).

Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
to be the first one that have this issue.

Signed-off-by: Frediano Ziglio <[email protected]>
---
arch/x86/xen/smp.c | 9 +++++++++
1 file changed, 9 insertions(+)


Just changed comment on source code as suggested by Andrew Cooper.


diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index d99cae8..6d89fcc 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -245,6 +245,15 @@ static void __init xen_smp_prepare_boot_cpu(void)
old memory can be recycled */
make_lowmem_page_readwrite(xen_initial_gdt);

+#ifdef CONFIG_X86_32
+ /*
+ * Xen starts us with XEN_FLAT_RING1_DS, but linux code
+ * expects __USER_DS
+ */
+ loadsegment(ds, __USER_DS);
+ loadsegment(es, __USER_DS);
+#endif
+
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
}
--
1.7.10.4

2013-10-07 09:49:28

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH v2] xen: Fix possible user space selector corruption

On 07/10/13 10:48, Frediano Ziglio wrote:
> Due to the way kernel is initialized under Xen is possible that the
> ring1 selector used by the kernel for the boot cpu end up to be copied
> to userspace leading to segmentation fault in the userspace.
>
> Xen code in the kernel initialize no-boot cpus with correct selectors (ds
> and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
> On task context switch (switch_to) we assume that ds, es and cs already
> point to __USER_DS and __KERNEL_CSso these selector are not changed.
>
> If processor is an Intel that support sysenter instruction sysenter/sysexit
> is used so ds and es are not restored switching back from kernel to
> userspace. In the case the selectors point to a ring1 instead of __USER_DS
> the userspace code will crash on first memory access attempt (to be
> precise Xen on the emulated iret used to do sysexit will detect and set ds
> and es to zero which lead to GPF anyway).
>
> Now if an userspace process call kernel using sysenter and get rescheduled
> (for me it happen on a specific init calling wait4) could happen that the
> ring1 selector is set to ds and es.
>
> This is quite hard to detect cause after a while these selectors are fixed
> (__USER_DS seems sticky).
>
> Bisecting the code commit 7076aada1040de4ed79a5977dbabdb5e5ea5e249 appears
> to be the first one that have this issue.
>
> Signed-off-by: Frediano Ziglio <[email protected]>

Reviewed-by: Andrew Cooper <[email protected]>

> ---
> arch/x86/xen/smp.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>
> Just changed comment on source code as suggested by Andrew Cooper.
>
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index d99cae8..6d89fcc 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -245,6 +245,15 @@ static void __init xen_smp_prepare_boot_cpu(void)
> old memory can be recycled */
> make_lowmem_page_readwrite(xen_initial_gdt);
>
> +#ifdef CONFIG_X86_32
> + /*
> + * Xen starts us with XEN_FLAT_RING1_DS, but linux code
> + * expects __USER_DS
> + */
> + loadsegment(ds, __USER_DS);
> + loadsegment(es, __USER_DS);
> +#endif
> +
> xen_filter_cpu_maps();
> xen_setup_vcpu_info_placement();
> }