2023-02-06 20:49:30

by Stanislav Kinsburskii

[permalink] [raw]
Subject: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

From: Stanislav Kinsburskiy <[email protected]>

And have it preset.
This change allows to significantly reduce time to bring up guest SMP
configuration as well as make sure the guest won't get inaccurate
calibration results due to "noisy neighbour" situation.

Below are the numbers for 16 VCPU guest before the patch (~1300 msec)

[ 0.562938] x86: Booting SMP configuration:
...
[ 1.859447] smp: Brought up 1 node, 16 CPUs

and after the patch (~130 msec):

[ 0.445079] x86: Booting SMP configuration:
...
[ 0.575035] smp: Brought up 1 node, 16 CPUs

This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
paravirt function to calculate cpu khz").

Signed-off-by: Stanislav Kinsburskiy <[email protected]>
CC: "K. Y. Srinivasan" <[email protected]>
CC: Haiyang Zhang <[email protected]>
CC: Wei Liu <[email protected]>
CC: Dexuan Cui <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Borislav Petkov <[email protected]>
CC: Dave Hansen <[email protected]>
CC: [email protected]
CC: "H. Peter Anvin" <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dedec2f23ad1..0282b2e96cc2 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -320,6 +320,21 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#endif

+static void __init __maybe_unused hv_preset_lpj(void)
+{
+ unsigned long khz;
+ u64 lpj;
+
+ if (!x86_platform.calibrate_tsc)
+ return;
+
+ khz = x86_platform.calibrate_tsc();
+
+ lpj = ((u64)khz * 1000);
+ do_div(lpj, HZ);
+ preset_lpj = lpj;
+}
+
static void __init ms_hyperv_init_platform(void)
{
int hv_max_functions_eax;
@@ -521,6 +536,12 @@ static void __init ms_hyperv_init_platform(void)

/* Register Hyper-V specific clocksource */
hv_init_clocksource();
+
+ /*
+ * Preset lpj to make calibrate_delay a no-op, which is turn helps to
+ * speed up secondary cores initialization.
+ */
+ hv_preset_lpj();
#endif
/*
* TSC should be marked as unstable only after Hyper-V




2023-02-07 23:24:53

by Nuno Das Neves

[permalink] [raw]
Subject: Re: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

On 2/6/2023 12:49 PM, Stanislav Kinsburskii wrote:
> From: Stanislav Kinsburskiy <[email protected]>
>
> And have it preset.
> This change allows to significantly reduce time to bring up guest SMP
> configuration as well as make sure the guest won't get inaccurate
> calibration results due to "noisy neighbour" situation.
>
> Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
>
> [ 0.562938] x86: Booting SMP configuration:
> ...
> [ 1.859447] smp: Brought up 1 node, 16 CPUs
>
> and after the patch (~130 msec):
>
> [ 0.445079] x86: Booting SMP configuration:
> ...
> [ 0.575035] smp: Brought up 1 node, 16 CPUs
>
> This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> paravirt function to calculate cpu khz").
>
> Signed-off-by: Stanislav Kinsburskiy <[email protected]>
> CC: "K. Y. Srinivasan" <[email protected]>
> CC: Haiyang Zhang <[email protected]>
> CC: Wei Liu <[email protected]>
> CC: Dexuan Cui <[email protected]>
> CC: Thomas Gleixner <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: Dave Hansen <[email protected]>
> CC: [email protected]
> CC: "H. Peter Anvin" <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index dedec2f23ad1..0282b2e96cc2 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -320,6 +320,21 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> }
> #endif
>
> +static void __init __maybe_unused hv_preset_lpj(void)
> +{
> + unsigned long khz;
> + u64 lpj;
> +
> + if (!x86_platform.calibrate_tsc)
> + return;
> +
> + khz = x86_platform.calibrate_tsc();
> +
> + lpj = ((u64)khz * 1000);
> + do_div(lpj, HZ);
> + preset_lpj = lpj;
> +}
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax;
> @@ -521,6 +536,12 @@ static void __init ms_hyperv_init_platform(void)
>
> /* Register Hyper-V specific clocksource */
> hv_init_clocksource();
> +
> + /*
> + * Preset lpj to make calibrate_delay a no-op, which is turn helps to
> + * speed up secondary cores initialization.
> + */
> + hv_preset_lpj();
> #endif
> /*
> * TSC should be marked as unstable only after Hyper-V
>

Reviewed-by: Nuno Das Neves <[email protected]>

2023-02-13 15:55:19

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

On Tue, Feb 07, 2023 at 03:24:47PM -0800, Nuno Das Neves wrote:
> On 2/6/2023 12:49 PM, Stanislav Kinsburskii wrote:
> > From: Stanislav Kinsburskiy <[email protected]>
> >
> > And have it preset.

In the future please add a blank line between two paragraphs.

> > This change allows to significantly reduce time to bring up guest SMP
> > configuration as well as make sure the guest won't get inaccurate
> > calibration results due to "noisy neighbour" situation.
> >

This looks like a good idea. 0293615f3fb9 was committed in 2008, so
we're very late to the party. Better late than never though.

If I hear no objections in a few days' time I will apply this to
hyperv-next with Nuno's Rb tag.

Thanks,
Wei.

> > Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
> >
> > [ 0.562938] x86: Booting SMP configuration:
> > ...
> > [ 1.859447] smp: Brought up 1 node, 16 CPUs
> >
> > and after the patch (~130 msec):
> >
> > [ 0.445079] x86: Booting SMP configuration:
> > ...
> > [ 0.575035] smp: Brought up 1 node, 16 CPUs
> >
> > This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> > paravirt function to calculate cpu khz").
> >
> > Signed-off-by: Stanislav Kinsburskiy <[email protected]>
> > CC: "K. Y. Srinivasan" <[email protected]>
> > CC: Haiyang Zhang <[email protected]>
> > CC: Wei Liu <[email protected]>
> > CC: Dexuan Cui <[email protected]>
> > CC: Thomas Gleixner <[email protected]>
> > CC: Ingo Molnar <[email protected]>
> > CC: Borislav Petkov <[email protected]>
> > CC: Dave Hansen <[email protected]>
> > CC: [email protected]
> > CC: "H. Peter Anvin" <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > ---
> > arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index dedec2f23ad1..0282b2e96cc2 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -320,6 +320,21 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> > }
> > #endif
> >
> > +static void __init __maybe_unused hv_preset_lpj(void)
> > +{
> > + unsigned long khz;
> > + u64 lpj;
> > +
> > + if (!x86_platform.calibrate_tsc)
> > + return;
> > +
> > + khz = x86_platform.calibrate_tsc();
> > +
> > + lpj = ((u64)khz * 1000);
> > + do_div(lpj, HZ);
> > + preset_lpj = lpj;
> > +}
> > +
> > static void __init ms_hyperv_init_platform(void)
> > {
> > int hv_max_functions_eax;
> > @@ -521,6 +536,12 @@ static void __init ms_hyperv_init_platform(void)
> >
> > /* Register Hyper-V specific clocksource */
> > hv_init_clocksource();
> > +
> > + /*
> > + * Preset lpj to make calibrate_delay a no-op, which is turn helps to
> > + * speed up secondary cores initialization.
> > + */
> > + hv_preset_lpj();
> > #endif
> > /*
> > * TSC should be marked as unstable only after Hyper-V
> >
>
> Reviewed-by: Nuno Das Neves <[email protected]>


2023-02-14 16:19:38

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

From: Stanislav Kinsburskii <[email protected]>
>
> And have it preset.
> This change allows to significantly reduce time to bring up guest SMP
> configuration as well as make sure the guest won't get inaccurate
> calibration results due to "noisy neighbour" situation.
>
> Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
>
> [ 0.562938] x86: Booting SMP configuration:
> ...
> [ 1.859447] smp: Brought up 1 node, 16 CPUs
>
> and after the patch (~130 msec):
>
> [ 0.445079] x86: Booting SMP configuration:
> ...
> [ 0.575035] smp: Brought up 1 node, 16 CPUs
>
> This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> paravirt function to calculate cpu khz").

This patch has been nagging at me a bit, and I finally did some further
checking. Looking at Linux guests on local Hyper-V and in Azure, I see
a dmesg output line like this during boot:

Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.81 BogoMIPS (lpj=2593905)

We're already skipping the delay loop calculation because lpj_fine
is set in tsc_init(), using the results of get_loops_per_jiffy(). The
latter does exactly the same calculation as hv_preset_lpj() in
this patch.

Is this patch arising from an environment where tsc_init() is
skipped for some reason? Just trying to make sure we fully
when this patch is applicable, and when not.

Michael

>
> Signed-off-by: Stanislav Kinsburskiy <[email protected]>
> CC: "K. Y. Srinivasan" <[email protected]>
> CC: Haiyang Zhang <[email protected]>
> CC: Wei Liu <[email protected]>
> CC: Dexuan Cui <[email protected]>
> CC: Thomas Gleixner <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Borislav Petkov <[email protected]>
> CC: Dave Hansen <[email protected]>
> CC: [email protected]
> CC: "H. Peter Anvin" <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index dedec2f23ad1..0282b2e96cc2 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -320,6 +320,21 @@ static void __init hv_smp_prepare_cpus(unsigned int
> max_cpus)
> }
> #endif
>
> +static void __init __maybe_unused hv_preset_lpj(void)
> +{
> + unsigned long khz;
> + u64 lpj;
> +
> + if (!x86_platform.calibrate_tsc)
> + return;
> +
> + khz = x86_platform.calibrate_tsc();
> +
> + lpj = ((u64)khz * 1000);
> + do_div(lpj, HZ);
> + preset_lpj = lpj;
> +}
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax;
> @@ -521,6 +536,12 @@ static void __init ms_hyperv_init_platform(void)
>
> /* Register Hyper-V specific clocksource */
> hv_init_clocksource();
> +
> + /*
> + * Preset lpj to make calibrate_delay a no-op, which is turn helps to
> + * speed up secondary cores initialization.
> + */
> + hv_preset_lpj();
> #endif
> /*
> * TSC should be marked as unstable only after Hyper-V
>

2023-02-17 00:01:52

by Stanislav Kinsburskii

[permalink] [raw]
Subject: Re: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

On Tue, Feb 14, 2023 at 04:19:13PM +0000, Michael Kelley (LINUX) wrote:
> From: Stanislav Kinsburskii <[email protected]>
> >
> > And have it preset.
> > This change allows to significantly reduce time to bring up guest SMP
> > configuration as well as make sure the guest won't get inaccurate
> > calibration results due to "noisy neighbour" situation.
> >
> > Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
> >
> > [ 0.562938] x86: Booting SMP configuration:
> > ...
> > [ 1.859447] smp: Brought up 1 node, 16 CPUs
> >
> > and after the patch (~130 msec):
> >
> > [ 0.445079] x86: Booting SMP configuration:
> > ...
> > [ 0.575035] smp: Brought up 1 node, 16 CPUs
> >
> > This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> > paravirt function to calculate cpu khz").
>
> This patch has been nagging at me a bit, and I finally did some further
> checking. Looking at Linux guests on local Hyper-V and in Azure, I see
> a dmesg output line like this during boot:
>
> Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.81 BogoMIPS (lpj=2593905)
>
> We're already skipping the delay loop calculation because lpj_fine
> is set in tsc_init(), using the results of get_loops_per_jiffy(). The
> latter does exactly the same calculation as hv_preset_lpj() in
> this patch.
>
> Is this patch arising from an environment where tsc_init() is
> skipped for some reason? Just trying to make sure we fully
> when this patch is applicable, and when not.
>

The problem here is a bit different: "lpj_fine" is considered only for
the boot CPU (from init/calibrate.c):

} else if ((!printed) && lpj_fine) {
lpj = lpj_fine;
pr_info("Calibrating delay loop (skipped), "
"value calculated using timer frequency.. ");

while all the secondary ones use the timer to calibrate.

With this change lpj_preset will be used for all cores (from
init/calbrate.c):

} else if (preset_lpj) {
lpj = preset_lpj;
if (!printed)
pr_info("Calibrating delay loop (skipped) "
"preset value.. ");

This lofic with lpj_fine comes from commit 3da757daf86e ("x86: use
cpu_khz for loops_per_jiffy calculation"), where the commit messages
states the following:

We do this only for the boot processor because the AP's can have
different base frequencies or the BIOS might boot a AP at a different
frequency.

Hope this helps.

Thanks,
Stanislav

> Michael
>
> >
> > Signed-off-by: Stanislav Kinsburskiy <[email protected]>
> > CC: "K. Y. Srinivasan" <[email protected]>
> > CC: Haiyang Zhang <[email protected]>
> > CC: Wei Liu <[email protected]>
> > CC: Dexuan Cui <[email protected]>
> > CC: Thomas Gleixner <[email protected]>
> > CC: Ingo Molnar <[email protected]>
> > CC: Borislav Petkov <[email protected]>
> > CC: Dave Hansen <[email protected]>
> > CC: [email protected]
> > CC: "H. Peter Anvin" <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > ---
> > arch/x86/kernel/cpu/mshyperv.c | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index dedec2f23ad1..0282b2e96cc2 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -320,6 +320,21 @@ static void __init hv_smp_prepare_cpus(unsigned int
> > max_cpus)
> > }
> > #endif
> >
> > +static void __init __maybe_unused hv_preset_lpj(void)
> > +{
> > + unsigned long khz;
> > + u64 lpj;
> > +
> > + if (!x86_platform.calibrate_tsc)
> > + return;
> > +
> > + khz = x86_platform.calibrate_tsc();
> > +
> > + lpj = ((u64)khz * 1000);
> > + do_div(lpj, HZ);
> > + preset_lpj = lpj;
> > +}
> > +
> > static void __init ms_hyperv_init_platform(void)
> > {
> > int hv_max_functions_eax;
> > @@ -521,6 +536,12 @@ static void __init ms_hyperv_init_platform(void)
> >
> > /* Register Hyper-V specific clocksource */
> > hv_init_clocksource();
> > +
> > + /*
> > + * Preset lpj to make calibrate_delay a no-op, which is turn helps to
> > + * speed up secondary cores initialization.
> > + */
> > + hv_preset_lpj();
> > #endif
> > /*
> > * TSC should be marked as unstable only after Hyper-V
> >
>

2023-02-17 02:34:30

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

From: Stanislav Kinsburskii <[email protected]> Sent: Thursday, February 16, 2023 11:41 AM
>
> On Tue, Feb 14, 2023 at 04:19:13PM +0000, Michael Kelley (LINUX) wrote:
> > From: Stanislav Kinsburskii <[email protected]>
> > >
> > > And have it preset.
> > > This change allows to significantly reduce time to bring up guest SMP
> > > configuration as well as make sure the guest won't get inaccurate
> > > calibration results due to "noisy neighbour" situation.
> > >
> > > Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
> > >
> > > [ 0.562938] x86: Booting SMP configuration:
> > > ...
> > > [ 1.859447] smp: Brought up 1 node, 16 CPUs
> > >
> > > and after the patch (~130 msec):
> > >
> > > [ 0.445079] x86: Booting SMP configuration:
> > > ...
> > > [ 0.575035] smp: Brought up 1 node, 16 CPUs
> > >
> > > This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> > > paravirt function to calculate cpu khz").
> >
> > This patch has been nagging at me a bit, and I finally did some further
> > checking. Looking at Linux guests on local Hyper-V and in Azure, I see
> > a dmesg output line like this during boot:
> >
> > Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.81
> BogoMIPS (lpj=2593905)
> >
> > We're already skipping the delay loop calculation because lpj_fine
> > is set in tsc_init(), using the results of get_loops_per_jiffy(). The
> > latter does exactly the same calculation as hv_preset_lpj() in
> > this patch.
> >
> > Is this patch arising from an environment where tsc_init() is
> > skipped for some reason? Just trying to make sure we fully
> > when this patch is applicable, and when not.
> >
>
> The problem here is a bit different: "lpj_fine" is considered only for
> the boot CPU (from init/calibrate.c):
>
> } else if ((!printed) && lpj_fine) {
> lpj = lpj_fine;
> pr_info("Calibrating delay loop (skipped), "
> "value calculated using timer frequency.. ");
>
> while all the secondary ones use the timer to calibrate.
>
> With this change lpj_preset will be used for all cores (from
> init/calbrate.c):
>
> } else if (preset_lpj) {
> lpj = preset_lpj;
> if (!printed)
> pr_info("Calibrating delay loop (skipped) "
> "preset value.. ");
>
> This lofic with lpj_fine comes from commit 3da757daf86e ("x86: use
> cpu_khz for loops_per_jiffy calculation"), where the commit messages
> states the following:
>
> We do this only for the boot processor because the AP's can have
> different base frequencies or the BIOS might boot a AP at a different
> frequency.
>
> Hope this helps.
>

Indeed, you are right about lpj_fine being applied only to the boot
CPU. So I've looked a little closer because I don't see the 1300
milliseconds you see for a 16 vCPU guest.

I've been experimenting with a 32 vCPU guest, and without your
patch, it takes only 26 milliseconds to get all 32 vCPUs started. I
think the trick is in the call to calibrate_delay_is_known(). This
function copies the lpj value from a CPU in the same NUMA node
that has already been calibrated, assuming that constant_tsc is
set, which is the case in my test VM. So the boot CPU sets lpj
based on lpj_fine, and all other CPUs effectively copy the value
from the boot CPU without doing calibration.

I also experimented with multiple NUMA nodes. In that case, it
does take a longer. Dividing the 32 vCPUs into 4 NUMA nodes,
it takes about 210 miliseconds to boot all 32 vCPUs. Presumably the
extra time is due to timer-based calibration being done once for each
NUMA node, plus probably some misc NUMA accounting overhead.
With preset_lpj set, that 210 milliseconds drops to 32 milliseconds,
which is more like the case with only 1 NUMA nodes, so there's some
modest benefit with multiple NUMA nodes.

Could you check if constant_tsc is set in your test environment? It
really should be set in a Hyper-V VM.

Michael

2023-02-21 18:56:42

by Stanislav Kinsburskii

[permalink] [raw]
Subject: Re: [PATCH] x86/hyperv: Pass on the lpj value from host to guest

On Fri, Feb 17, 2023 at 02:34:21AM +0000, Michael Kelley (LINUX) wrote:
> From: Stanislav Kinsburskii <[email protected]> Sent: Thursday, February 16, 2023 11:41 AM
> >
> > On Tue, Feb 14, 2023 at 04:19:13PM +0000, Michael Kelley (LINUX) wrote:
> > > From: Stanislav Kinsburskii <[email protected]>
> > > >
> > > > And have it preset.
> > > > This change allows to significantly reduce time to bring up guest SMP
> > > > configuration as well as make sure the guest won't get inaccurate
> > > > calibration results due to "noisy neighbour" situation.
> > > >
> > > > Below are the numbers for 16 VCPU guest before the patch (~1300 msec)
> > > >
> > > > [ 0.562938] x86: Booting SMP configuration:
> > > > ...
> > > > [ 1.859447] smp: Brought up 1 node, 16 CPUs
> > > >
> > > > and after the patch (~130 msec):
> > > >
> > > > [ 0.445079] x86: Booting SMP configuration:
> > > > ...
> > > > [ 0.575035] smp: Brought up 1 node, 16 CPUs
> > > >
> > > > This change is inspired by commit 0293615f3fb9 ("x86: KVM guest: use
> > > > paravirt function to calculate cpu khz").
> > >
> > > This patch has been nagging at me a bit, and I finally did some further
> > > checking. Looking at Linux guests on local Hyper-V and in Azure, I see
> > > a dmesg output line like this during boot:
> > >
> > > Calibrating delay loop (skipped), value calculated using timer frequency.. 5187.81
> > BogoMIPS (lpj=2593905)
> > >
> > > We're already skipping the delay loop calculation because lpj_fine
> > > is set in tsc_init(), using the results of get_loops_per_jiffy(). The
> > > latter does exactly the same calculation as hv_preset_lpj() in
> > > this patch.
> > >
> > > Is this patch arising from an environment where tsc_init() is
> > > skipped for some reason? Just trying to make sure we fully
> > > when this patch is applicable, and when not.
> > >
> >
> > The problem here is a bit different: "lpj_fine" is considered only for
> > the boot CPU (from init/calibrate.c):
> >
> > } else if ((!printed) && lpj_fine) {
> > lpj = lpj_fine;
> > pr_info("Calibrating delay loop (skipped), "
> > "value calculated using timer frequency.. ");
> >
> > while all the secondary ones use the timer to calibrate.
> >
> > With this change lpj_preset will be used for all cores (from
> > init/calbrate.c):
> >
> > } else if (preset_lpj) {
> > lpj = preset_lpj;
> > if (!printed)
> > pr_info("Calibrating delay loop (skipped) "
> > "preset value.. ");
> >
> > This lofic with lpj_fine comes from commit 3da757daf86e ("x86: use
> > cpu_khz for loops_per_jiffy calculation"), where the commit messages
> > states the following:
> >
> > We do this only for the boot processor because the AP's can have
> > different base frequencies or the BIOS might boot a AP at a different
> > frequency.
> >
> > Hope this helps.
> >
>
> Indeed, you are right about lpj_fine being applied only to the boot
> CPU. So I've looked a little closer because I don't see the 1300
> milliseconds you see for a 16 vCPU guest.
>
> I've been experimenting with a 32 vCPU guest, and without your
> patch, it takes only 26 milliseconds to get all 32 vCPUs started. I
> think the trick is in the call to calibrate_delay_is_known(). This
> function copies the lpj value from a CPU in the same NUMA node
> that has already been calibrated, assuming that constant_tsc is
> set, which is the case in my test VM. So the boot CPU sets lpj
> based on lpj_fine, and all other CPUs effectively copy the value
> from the boot CPU without doing calibration.
>
> I also experimented with multiple NUMA nodes. In that case, it
> does take a longer. Dividing the 32 vCPUs into 4 NUMA nodes,
> it takes about 210 miliseconds to boot all 32 vCPUs. Presumably the
> extra time is due to timer-based calibration being done once for each
> NUMA node, plus probably some misc NUMA accounting overhead.
> With preset_lpj set, that 210 milliseconds drops to 32 milliseconds,
> which is more like the case with only 1 NUMA nodes, so there's some
> modest benefit with multiple NUMA nodes.
>
> Could you check if constant_tsc is set in your test environment? It
> really should be set in a Hyper-V VM.
>

I guess I should have mentioned, that the results presented in the
commit message are from L2 guest, where there are no NUMA nodes and thus
every core is calibrated individually and thus boot time grows linearly
with the number of the cores assigned.

I'm not sure though, would NUMA emulation be a right choice here or
should this boot time penalty be left as is because we can't guarantee
all the processes are in the same numa node and thus their lpj values
have to be measured.

What do you think, Michael?

Thanks,
Stanislav

> Michael