2016-10-05 17:15:03

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH] xen/x86: Update topology map for PV VCPUs

Early during boot topology_update_package_map() computes
logical_pkg_ids for all present processors.

Later, when processors are brought up, identify_cpu() updates
these values based on phys_pkg_id which is a function of
initial_apicid. On PV guests the latter may point to a
non-existing node, causing logical_pkg_ids to be set to -1.

Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
to index its arrays and therefore in this case will point to index
65535 (since logical_pkg_id is a u16). This could lead to either a
crash or may actually access random memory location.

As a workaround, we recompute topology during CPU bringup to reset
logical_pkg_id to a valid value.

(The reason for initial_apicid being bogus is because it is
initial_apicid of the processor from which the guest is launched.
This value is CPUID(1).EBX[31:24])

Signed-off-by: Boris Ostrovsky <[email protected]>
Cc: [email protected]
---

Copying Andrew for the CPUID part.

arch/x86/xen/smp.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 311acad..9fa27ce 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -87,6 +87,12 @@ static void cpu_bringup(void)
cpu_data(cpu).x86_max_cores = 1;
set_cpu_sibling_map(cpu);

+ /*
+ * identify_cpu() may have set logical_pkg_id to -1 due
+ * to incorrect phys_proc_id. Let's re-comupte it.
+ */
+ topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu);
+
xen_setup_cpu_clockevents();

notify_cpu_starting(cpu);
--
1.8.3.1


2016-10-05 17:41:47

by Andrew Cooper

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Update topology map for PV VCPUs

On 05/10/16 18:09, Boris Ostrovsky wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
>
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
>
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.
>
> As a workaround, we recompute topology during CPU bringup to reset
> logical_pkg_id to a valid value.
>
> (The reason for initial_apicid being bogus is because it is
> initial_apicid of the processor from which the guest is launched.
> This value is CPUID(1).EBX[31:24])
>
> Signed-off-by: Boris Ostrovsky <[email protected]>
> Cc: [email protected]
> ---
>
> Copying Andrew for the CPUID part.

Yeah - that leaf is usually fiction. (Specifically, the fiction of
whichever cpu a specific toolstack function happened to sample at the
point in time that it was choosing which cpuid values to fake up for the
guest).

I am currently working on fixing the reported topology information to be
architecturally plausible, but current and previous hypervisors will be
wrong.

~Andrew

>
> arch/x86/xen/smp.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 311acad..9fa27ce 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -87,6 +87,12 @@ static void cpu_bringup(void)
> cpu_data(cpu).x86_max_cores = 1;
> set_cpu_sibling_map(cpu);
>
> + /*
> + * identify_cpu() may have set logical_pkg_id to -1 due
> + * to incorrect phys_proc_id. Let's re-comupte it.
> + */
> + topology_update_package_map(apic->cpu_present_to_apicid(cpu), cpu);
> +
> xen_setup_cpu_clockevents();
>
> notify_cpu_starting(cpu);

2016-10-06 12:14:16

by Jan Beulich

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Update topology map for PV VCPUs

>>> On 05.10.16 at 19:09, <[email protected]> wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
>
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
>
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.

Another clear indication that such fields should never be touched
(and hence consumers either be fixed or disabled) when running as
PV guest under Xen.

Jan

2016-10-06 14:12:27

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Update topology map for PV VCPUs

On 05/10/16 18:09, Boris Ostrovsky wrote:
> Early during boot topology_update_package_map() computes
> logical_pkg_ids for all present processors.
>
> Later, when processors are brought up, identify_cpu() updates
> these values based on phys_pkg_id which is a function of
> initial_apicid. On PV guests the latter may point to a
> non-existing node, causing logical_pkg_ids to be set to -1.
>
> Intel's RAPL uses logical_pkg_id (as topology_logical_package_id())
> to index its arrays and therefore in this case will point to index
> 65535 (since logical_pkg_id is a u16). This could lead to either a
> crash or may actually access random memory location.
>
> As a workaround, we recompute topology during CPU bringup to reset
> logical_pkg_id to a valid value.
>
> (The reason for initial_apicid being bogus is because it is
> initial_apicid of the processor from which the guest is launched.
> This value is CPUID(1).EBX[31:24])

Applied to for-linus-4.9, thanks.

David