The initial observation was that in PV mode under Xen 32-bit user space
didn't work anymore. Attempts of system calls ended in #GP(0x402). All
of the sudden the vector 0x80 handler was not in place anymore. As it
turns out up to 5.13 redundant initialization did occur: Once from
cpu_initialize_context() (through its VCPUOP_initialise hypercall) and a
2nd time while each CPU was brought fully up. This 2nd initialization is
now gone, uncovering that the 1st one was flawed: Unlike for the
set_trap_table hypercall, a full virtual IDT needs to be specified here;
the "vector" fields of the individual entries are of no interest. With
many (kernel) IDT entries still(?) (i.e. at that point at least) empty,
the syscall vector 0x80 ended up in slot 0x20 of the virtual IDT, thus
becoming the domain's handler for vector 0x20.
Since xen_copy_trap_info() has just this single purpose, simply adjust
that function. xen_convert_trap_info() cannot be used here. Its use
would also have lead to a buffer overrun if all (kernel) IDT entries
were populated, due to the function setting a sentinel entry at the end.
(I didn't bother trying to identify the commit which uncovered the issue
in 5.14; the commit named below is the one which actually introduced the
bad code.)
Fixes: f87e4cac4f4e ("xen: SMP guest support")
Cc: [email protected]
Signed-off-by: Jan Beulich <[email protected]>
---
In how far it is correct to use the current CPU's IDT is unclear to me.
Looks at least like another latent trap.
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -775,8 +775,15 @@ static void xen_convert_trap_info(const
void xen_copy_trap_info(struct trap_info *traps)
{
const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
+ unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
- xen_convert_trap_info(desc, traps);
+ BUG_ON(count > 256);
+
+ for (i = 0; i < count; ++i) {
+ const gate_desc *entry = (gate_desc *)desc->address + i;
+
+ cvt_gate_to_trap(i, entry, &traps[i]);
+ }
}
/* Load a new IDT into Xen. In principle this can be per-CPU, so we
On 9/16/21 11:04 AM, Jan Beulich wrote:
> {
> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>
> - xen_convert_trap_info(desc, traps);
Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
-boris
> + BUG_ON(count > 256);
> +
> + for (i = 0; i < count; ++i) {
> + const gate_desc *entry = (gate_desc *)desc->address + i;
> +
> + cvt_gate_to_trap(i, entry, &traps[i]);
> + }
> }
>
> /* Load a new IDT into Xen. In principle this can be per-CPU, so we
>
On 17.09.2021 03:34, Boris Ostrovsky wrote:
>
> On 9/16/21 11:04 AM, Jan Beulich wrote:
>> {
>> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
>> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>>
>> - xen_convert_trap_info(desc, traps);
>
>
> Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
I can, sure, but I specifically didn't, as the result is going to be less
readable imo. Instead I was considering to fold xen_convert_trap_info()
into its only remaining caller. Yet if you're convinced adding the
parameter is the way to do, I will go that route. But please confirm.
Jan
On 17.09.21 08:40, Jan Beulich wrote:
> On 17.09.2021 03:34, Boris Ostrovsky wrote:
>>
>> On 9/16/21 11:04 AM, Jan Beulich wrote:
>>> {
>>> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
>>> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>>>
>>> - xen_convert_trap_info(desc, traps);
>>
>>
>> Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
>
> I can, sure, but I specifically didn't, as the result is going to be less
> readable imo. Instead I was considering to fold xen_convert_trap_info()
> into its only remaining caller. Yet if you're convinced adding the
> parameter is the way to do, I will go that route. But please confirm.
I don't think the result will be very hard to read. All you need is the
new parameter and extending the if statement in xen_convert_trap_info()
to increment out always if no entry is to be skipped.
Juergen
On 17.09.21 08:50, Jan Beulich wrote:
> On 17.09.2021 08:47, Juergen Gross wrote:
>> On 17.09.21 08:40, Jan Beulich wrote:
>>> On 17.09.2021 03:34, Boris Ostrovsky wrote:
>>>>
>>>> On 9/16/21 11:04 AM, Jan Beulich wrote:
>>>>> {
>>>>> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
>>>>> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>>>>>
>>>>> - xen_convert_trap_info(desc, traps);
>>>>
>>>>
>>>> Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
>>>
>>> I can, sure, but I specifically didn't, as the result is going to be less
>>> readable imo. Instead I was considering to fold xen_convert_trap_info()
>>> into its only remaining caller. Yet if you're convinced adding the
>>> parameter is the way to do, I will go that route. But please confirm.
>>
>> I don't think the result will be very hard to read. All you need is the
>> new parameter and extending the if statement in xen_convert_trap_info()
>> to increment out always if no entry is to be skipped.
>
> And skip writing the sentinel.
Maybe it would be even better then to let xen_convert_trap_info() return
the number of entries written and to write the sentinel in
xen_load_idt() instead, as this is the only place where it is needed.
Juergen
On 17.09.2021 08:47, Juergen Gross wrote:
> On 17.09.21 08:40, Jan Beulich wrote:
>> On 17.09.2021 03:34, Boris Ostrovsky wrote:
>>>
>>> On 9/16/21 11:04 AM, Jan Beulich wrote:
>>>> {
>>>> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
>>>> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>>>>
>>>> - xen_convert_trap_info(desc, traps);
>>>
>>>
>>> Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
>>
>> I can, sure, but I specifically didn't, as the result is going to be less
>> readable imo. Instead I was considering to fold xen_convert_trap_info()
>> into its only remaining caller. Yet if you're convinced adding the
>> parameter is the way to do, I will go that route. But please confirm.
>
> I don't think the result will be very hard to read. All you need is the
> new parameter and extending the if statement in xen_convert_trap_info()
> to increment out always if no entry is to be skipped.
And skip writing the sentinel.
Jan
On 9/17/21 3:24 AM, Juergen Gross wrote:
> On 17.09.21 08:50, Jan Beulich wrote:
>> On 17.09.2021 08:47, Juergen Gross wrote:
>>> On 17.09.21 08:40, Jan Beulich wrote:
>>>> On 17.09.2021 03:34, Boris Ostrovsky wrote:
>>>>>
>>>>> On 9/16/21 11:04 AM, Jan Beulich wrote:
>>>>>> {
>>>>>> const struct desc_ptr *desc = this_cpu_ptr(&idt_desc);
>>>>>> + unsigned i, count = (desc->size + 1) / sizeof(gate_desc);
>>>>>> - xen_convert_trap_info(desc, traps);
>>>>>
>>>>>
>>>>> Can you instead add a boolean parameter to xen_convert_trap_info() to indicate whether to skip empty entries? That will avoid (almost) duplicating the code.
>>>>
>>>> I can, sure, but I specifically didn't, as the result is going to be less
>>>> readable imo. Instead I was considering to fold xen_convert_trap_info()
>>>> into its only remaining caller. Yet if you're convinced adding the
>>>> parameter is the way to do, I will go that route. But please confirm.
Yes, that would be my preference. No preference on where to set the sentinel.
Thanks.
-boris
>>>
>>> I don't think the result will be very hard to read. All you need is the
>>> new parameter and extending the if statement in xen_convert_trap_info()
>>> to increment out always if no entry is to be skipped.
>>
>> And skip writing the sentinel.
>
> Maybe it would be even better then to let xen_convert_trap_info() return
> the number of entries written and to write the sentinel in
> xen_load_idt() instead, as this is the only place where it is needed.
>
>
> Juergen