This patch enables parallel microcode loading. In order to measure the
improvements of parallel vs serial, we have used the following diff:
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 577b223..1ea08d8 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -614,17 +614,25 @@ static int __reload_late(void *info)
*/
static int microcode_reload_late(void)
{
+ u64 p0, p1;
int ret;
atomic_set(&late_cpus_in, 0);
atomic_set(&late_cpus_out, 0);
+ p0 = rdtsc_ordered();
+
ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
+
+ p1 = rdtsc_ordered();
+
if (ret > 0)
microcode_check();
pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
+ pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
+
return ret;
}
We have used a machine with a broken microcode in BIOS and no microcode in
initramfs (to bypass early loading).
Here are the results for parallel loading (we made two measurements):
[root@ovs108 ~]# uname -a
Linux ovs108 5.3.0-rc5.master.parallel.el7.dev.x86_64 #1 SMP Thu Aug 22 10:17:04 GMT 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@ovs108 ~]# dmesg | grep microcode
[ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control
[ 0.197658] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 2.114135] microcode: sig=0x50654, pf=0x80, revision=0x200003a
[ 2.117555] microcode: Microcode Update Driver: v2.2.
[ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
[ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604
[root@ovs108 ~]# dmesg | grep microcode
[ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control
[ 0.195218] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 2.111882] microcode: sig=0x50654, pf=0x80, revision=0x200003a
[ 2.115265] microcode: Microcode Update Driver: v2.2.
[ 18.033397] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 18.036590] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 18.036595] microcode: Reload completed, microcode revision: 0x200005e
[ 18.036597] microcode: p0: 118947162428414, p1: 118947191490162, diff: 29061748
Here are the results of serial loading:
[root@ovs108 ~]# uname -a
Linux ovs108 5.3.0-rc5.master.serial.el7.dev.x86_64 #1 SMP Thu Aug 22 12:22:18 GMT 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@ovs108 ~]# dmesg | grep microcode
[ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control
[ 0.195158] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 2.111353] microcode: sig=0x50654, pf=0x80, revision=0x200003a
[ 2.114834] microcode: Microcode Update Driver: v2.2.
[ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
[ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034
[root@ovs108 ~]# dmesg | grep microcode
[ 0.000000] Intel Spectre v2 broken microcode detected; disabling Speculation Control
[ 0.197158] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 2.114005] microcode: sig=0x50654, pf=0x80, revision=0x200003a
[ 2.117451] microcode: Microcode Update Driver: v2.2.
[ 17.732026] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 18.041398] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 18.041404] microcode: Reload completed, microcode revision: 0x200005e
[ 18.041407] microcode: p0: 149835792698162, p1: 149836532930286, diff: 740232124
One can see that the difference is an order magnitude.
---
I also tested microcode loading with cpu hotplug.
- I unplugged the last two CPUs (basically the last core with 2 hyperthreads)
[ 1077.756759] IRQ 324: no longer affine to CPU71
[ 1077.756889] IRQ 619: no longer affine to CPU71
[ 1077.756908] IRQ 645: no longer affine to CPU71
[ 1077.761213] smpboot: CPU 71 is now offline
[ 1082.702759] IRQ 289: no longer affine to CPU70
[ 1082.702771] IRQ 305: no longer affine to CPU70
[ 1082.702827] IRQ 521: no longer affine to CPU70
[ 1082.702860] IRQ 636: no longer affine to CPU70
[ 1082.702876] IRQ 679: no longer affine to CPU70
[ 1082.706897] smpboot: CPU 70 is now offline
- I did the microcode update:
[ 1123.818741] microcode: updated to revision 0x200005e, date = 2019-04-02
[ 1123.821013] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
[ 1123.821014] x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
[ 1123.821014] microcode: Reload completed, microcode revision: 0x200005e
[ 1123.821015] microcode: p0: 197460831869308, p1: 197460853607904, diff: 21738596
- Than I onlined CPU 70/71
[ 1151.170814] smpboot: Booting Node 1 Processor 70 APIC 0x75
[ 1151.177199] microcode: sig=0x50654, pf=0x80, revision=0x200005e
[ 1182.523811] smpboot: Booting Node 1 Processor 71 APIC 0x77
root@ovs108 ~]# cat /proc/cpuinfo | tr "\t" " " | grep -A 6 "processor : 70" | grep microcode
microcode : 0x200005e
[root@ovs108 ~]# cat /proc/cpuinfo | tr "\t" " " | grep -A 6 "processor : 71" | grep microcode
microcode : 0x200005e
We can see that both CPUs have been updated to the same microcode revision.
Thank you,
Mihai
Ashok Raj (1):
x86/microcode: Update microcode for all cores in parallel
arch/x86/kernel/cpu/microcode/core.c | 44 ++++++++++++++++++++++++-----------
arch/x86/kernel/cpu/microcode/intel.c | 14 ++++-------
2 files changed, 36 insertions(+), 22 deletions(-)
--
1.8.3.1
Hi!
> + u64 p0, p1;
> int ret;
>
> atomic_set(&late_cpus_in, 0);
> atomic_set(&late_cpus_out, 0);
>
> + p0 = rdtsc_ordered();
> +
> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
> +
> + p1 = rdtsc_ordered();
> +
> if (ret > 0)
> microcode_check();
>
> pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
>
> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
> +
> return ret;
> }
>
> We have used a machine with a broken microcode in BIOS and no microcode in
> initramfs (to bypass early loading).
>
> Here are the results for parallel loading (we made two measurements):
> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
> [ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604
> Here are the results of serial loading:
>
> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
> [ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034
>
> One can see that the difference is an order magnitude.
Well, that's impressive, but it seems to finish 300 msec later? Where does that difference
come from / how much real time do you gain by this?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> On 1 Sep 2019, at 20:25, Pavel Machek <[email protected]> wrote:
>
> Hi!
>
>> + u64 p0, p1;
>> int ret;
>>
>> atomic_set(&late_cpus_in, 0);
>> atomic_set(&late_cpus_out, 0);
>>
>> + p0 = rdtsc_ordered();
>> +
>> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
>> +
>> + p1 = rdtsc_ordered();
>> +
>> if (ret > 0)
>> microcode_check();
>>
>> pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
>>
>> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
>> +
>> return ret;
>> }
>>
>> We have used a machine with a broken microcode in BIOS and no microcode in
>> initramfs (to bypass early loading).
>>
>> Here are the results for parallel loading (we made two measurements):
>
>> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
>> [ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
>> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
>> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604
>
>> Here are the results of serial loading:
>>
>> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
>> [ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
>> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
>> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034
>>
>> One can see that the difference is an order magnitude.
>
> Well, that's impressive, but it seems to finish 300 msec later? Where does that difference
> come from / how much real time do you gain by this?
The difference comes from the large amount of cores/threads the machine has: 72 in this case, but there are machines with more. As the commit message says initially the microcode was applied serially one by one and now the microcode is updated in parallel on all cores.
300ms seems nothing but it is enough to cause disruption in some critical services (e.g. storage) - 300ms in which we do not execute anything on CPUs. Also this 300ms is increasing when the machine is fully loaded with guests.
Thanks,
Mihai
>
> Best regards,
> Pavel
>
> --
> (english) https://urldefense.proofpoint.com/v2/url?u=http-3A__www.livejournal.com_-7Epavelmachek&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=IOMTUEJr06tE0LeEzvwr_907ba6u9S5iDf7M8ZYjbGY&m=cz26YweqnHS4QvZBi-1jNR8t7o3n04-8UsSBZqEQHgA&s=-nEQbDyJrDjKxyrt496frey_aMJHXmgMcm-hH0ewO7M&e=
> (cesky, pictures) https://urldefense.proofpoint.com/v2/url?u=http-3A__atrey.karlin.mff.cuni.cz_-7Epavel_picture_horses_blog.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=IOMTUEJr06tE0LeEzvwr_907ba6u9S5iDf7M8ZYjbGY&m=cz26YweqnHS4QvZBi-1jNR8t7o3n04-8UsSBZqEQHgA&s=0L72IdzqTDn_8PmDVcNxLAFbcYG1jRDN9ob8SZ18XTE&e=
Hi!
> >> + u64 p0, p1;
> >> int ret;
> >>
> >> atomic_set(&late_cpus_in, 0);
> >> atomic_set(&late_cpus_out, 0);
> >>
> >> + p0 = rdtsc_ordered();
> >> +
> >> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
> >> +
> >> + p1 = rdtsc_ordered();
> >> +
> >> if (ret > 0)
> >> microcode_check();
> >>
> >> pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
> >>
> >> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
> >> +
> >> return ret;
> >> }
> >>
> >> We have used a machine with a broken microcode in BIOS and no microcode in
> >> initramfs (to bypass early loading).
> >>
> >> Here are the results for parallel loading (we made two measurements):
> >
> >> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
> >> [ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> >> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
> >> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604
> >
> >> Here are the results of serial loading:
> >>
> >> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
> >> [ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
> >> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
> >> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034
> >>
> >> One can see that the difference is an order magnitude.
> >
> > Well, that's impressive, but it seems to finish 300 msec later? Where does that difference
> > come from / how much real time do you gain by this?
>
> The difference comes from the large amount of cores/threads the machine has: 72 in this case, but there are machines with more. As the commit message says initially the microcode was applied serially one by one and now the microcode is updated in parallel on all cores.
>
> 300ms seems nothing but it is enough to cause disruption in some critical services (e.g. storage) - 300ms in which we do not execute anything on CPUs. Also this 300ms is increasing when the machine is fully loaded with guests.
>
Yes, but if you look at the dmesgs I quoted, paralel microcode update
actually finished 300msec _later_.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
La 02.09.2019 10:39, Pavel Machek a scris:
> Hi!
>
>>>> + u64 p0, p1;
>>>> int ret;
>>>>
>>>> atomic_set(&late_cpus_in, 0);
>>>> atomic_set(&late_cpus_out, 0);
>>>>
>>>> + p0 = rdtsc_ordered();
>>>> +
>>>> ret = stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
>>>> +
>>>> + p1 = rdtsc_ordered();
>>>> +
>>>> if (ret > 0)
>>>> microcode_check();
>>>>
>>>> pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.microcode);
>>>>
>>>> + pr_info("p0: %lld, p1: %lld, diff: %lld\n", p0, p1, p1 - p0);
>>>> +
>>>> return ret;
>>>> }
>>>>
>>>> We have used a machine with a broken microcode in BIOS and no microcode in
>>>> initramfs (to bypass early loading).
>>>>
>>>> Here are the results for parallel loading (we made two measurements):
>>>
>>>> [ 18.197760] microcode: updated to revision 0x200005e, date = 2019-04-02
>>>> [ 18.201225] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
>>>> [ 18.201230] microcode: Reload completed, microcode revision: 0x200005e
>>>> [ 18.201232] microcode: p0: 118138123843052, p1: 118138153732656, diff: 29889604
>>>
>>>> Here are the results of serial loading:
>>>>
>>>> [ 17.542518] microcode: updated to revision 0x200005e, date = 2019-04-02
>>>> [ 17.898365] x86/CPU: CPU features have changed after loading microcode, but might not take effect.
>>>> [ 17.898370] microcode: Reload completed, microcode revision: 0x200005e
>>>> [ 17.898372] microcode: p0: 149220216047388, p1: 149221058945422, diff: 842898034
>>>>
>>>> One can see that the difference is an order magnitude.
>>>
>>> Well, that's impressive, but it seems to finish 300 msec later? Where does that difference
>>> come from / how much real time do you gain by this?
>>
>> The difference comes from the large amount of cores/threads the machine has: 72 in this case, but there are machines with more. As the commit message says initially the microcode was applied serially one by one and now the microcode is updated in parallel on all cores.
>>
>> 300ms seems nothing but it is enough to cause disruption in some critical services (e.g. storage) - 300ms in which we do not execute anything on CPUs. Also this 300ms is increasing when the machine is fully loaded with guests.
>>
>
> Yes, but if you look at the dmesgs I quoted, paralel microcode update
> actually finished 300msec _later_.
That is the serial loading (it is written before: "Here are the results
of serial loading:"), parallel is before. Am I missing something?
> Pavel
>