2018-01-09 22:29:25

by Gabriel C

[permalink] [raw]
Subject: AMD EPYC microcode update bug?

Hello ,

I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.

I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.

First run was made by using the early microcode option with dracut[1]
so loading from a initrd. the driver reported 63 updated CPUs while CPU0
got still old microcode.


snip

crazy@ant:~/fw$ dmesg | grep microcode
[ 2.615876] microcode: microcode updated early to new patch_level=0x08001213
[ 2.615906] microcode: CPU0: patch_level=0x08001207
[ 2.615920] microcode: CPU1: patch_level=0x08001213

...

crazy@ant:~/fw$ cat /proc/cpuinfo | head -n 30
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD EPYC 7281 16-Core Processor
stepping : 2
microcode : 0x8001207

....

After reloading the microcode with

echo 1 > /sys/devices/system/cpu/microcode/reload


CPU0 got new microcode too.

Now I tested the same but without initrd early microcode loading
and CONFIG_EXTRA_FIRMWARE set like this:

CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd.bin
amd-ucode/microcode_amd_fam15h.bin amd-ucode/microcode_amd_fam16h.bin
amd-ucode/microcode_amd_fam17h.bin"


This time all CPUs got update fine without the need of reloading the microcode.

Is that some sort timing problem ?


Also I notice on a Intel system the 'early updating' means that , is
the first I see on dmesg
while on AMD system it seems to fire up much later. Why is that ?


Regards,

Gabriel C

1. Fix for Fam17 micrcode :
https://github.com/dracutdevs/dracut/commit/19453dc8744e6a59725c43b61b2e3db01cb4c57c#diff-bf0c6db1d4aaaa22a88b2649ddbfcd2a


2018-01-09 22:47:42

by Tom Lendacky

[permalink] [raw]
Subject: Re: AMD EPYC microcode update bug?

On 1/9/2018 4:28 PM, Gabriel C wrote:
> Hello ,
>
> I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.
>
> I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.
>
> First run was made by using the early microcode option with dracut[1]
> so loading from a initrd. the driver reported 63 updated CPUs while CPU0
> got still old microcode.

I'm guessing that memory encryption is enabled, correct? I've submitted a
patch series to perform early initrd decryption for just this problem. I'm
incorporating some minor feedback and getting ready to submit the next
version.

In the meantime, if you specify mem_encrypt=off on the kernel command line
it should show CPU0 updated properly (with mem_encrypt=on and SMT enabled,
I believe it really does get updated when the sibling hread is updated -
do a rdmsr of 0x0000008b to verify).

Thanks,
Tom

>
>
> snip
>
> crazy@ant:~/fw$ dmesg | grep microcode
> [ 2.615876] microcode: microcode updated early to new patch_level=0x08001213
> [ 2.615906] microcode: CPU0: patch_level=0x08001207
> [ 2.615920] microcode: CPU1: patch_level=0x08001213
>
> ...
>
> crazy@ant:~/fw$ cat /proc/cpuinfo | head -n 30
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 23
> model : 1
> model name : AMD EPYC 7281 16-Core Processor
> stepping : 2
> microcode : 0x8001207
>
> ....
>
> After reloading the microcode with
>
> echo 1 > /sys/devices/system/cpu/microcode/reload
>
>
> CPU0 got new microcode too.
>
> Now I tested the same but without initrd early microcode loading
> and CONFIG_EXTRA_FIRMWARE set like this:
>
> CONFIG_EXTRA_FIRMWARE="amd-ucode/microcode_amd.bin
> amd-ucode/microcode_amd_fam15h.bin amd-ucode/microcode_amd_fam16h.bin
> amd-ucode/microcode_amd_fam17h.bin"
>
>
> This time all CPUs got update fine without the need of reloading the microcode.
>
> Is that some sort timing problem ?
>
>
> Also I notice on a Intel system the 'early updating' means that , is
> the first I see on dmesg
> while on AMD system it seems to fire up much later. Why is that ?
>
>
> Regards,
>
> Gabriel C
>
> 1. Fix for Fam17 micrcode :
> https://github.com/dracutdevs/dracut/commit/19453dc8744e6a59725c43b61b2e3db01cb4c57c#diff-bf0c6db1d4aaaa22a88b2649ddbfcd2a
>

2018-01-09 23:07:19

by Gabriel C

[permalink] [raw]
Subject: Re: AMD EPYC microcode update bug?

2018-01-09 23:47 GMT+01:00 Tom Lendacky <[email protected]>:
> On 1/9/2018 4:28 PM, Gabriel C wrote:
>> Hello ,
>>
>> I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.
>>
>> I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.
>>
>> First run was made by using the early microcode option with dracut[1]
>> so loading from a initrd. the driver reported 63 updated CPUs while CPU0
>> got still old microcode.
>
> I'm guessing that memory encryption is enabled, correct? I've submitted a
> patch series to perform early initrd decryption for just this problem. I'm
> incorporating some minor feedback and getting ready to submit the next
> version.

Yes is correct I use mem_encrypt=on and SMT on in BIOS.

Can you give me an link to the patch series ?

>
> In the meantime, if you specify mem_encrypt=off on the kernel command line
> it should show CPU0 updated properly (with mem_encrypt=on and SMT enabled,
> I believe it really does get updated when the sibling hread is updated -
> do a rdmsr of 0x0000008b to verify).
>

I give that an test in a bit , the box is running now some test for a
different EPYC issue :)

( https://community.amd.com/thread/224000 )

Regards,

Gabriel C

2018-01-09 23:37:53

by Gabriel C

[permalink] [raw]
Subject: Re: AMD EPYC microcode update bug?

On 10.01.2018 00:06, Gabriel C wrote:
> 2018-01-09 23:47 GMT+01:00 Tom Lendacky <[email protected]>:
>> On 1/9/2018 4:28 PM, Gabriel C wrote:
>>> Hello ,
>>>
>>> I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.
>>>
>>> I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.
>>>
>>> First run was made by using the early microcode option with dracut[1]
>>> so loading from a initrd. the driver reported 63 updated CPUs while CPU0
>>> got still old microcode.
>>
>> I'm guessing that memory encryption is enabled, correct? I've submitted a
>> patch series to perform early initrd decryption for just this problem. I'm
>> incorporating some minor feedback and getting ready to submit the next
>> version.
>
> Yes is correct I use mem_encrypt=on and SMT on in BIOS.
>
> Can you give me an link to the patch series ?
>
>>
>> In the meantime, if you specify mem_encrypt=off on the kernel command line
>> it should show CPU0 updated properly (with mem_encrypt=on and SMT enabled,
>> I believe it really does get updated when the sibling hread is updated -
>> do a rdmsr of 0x0000008b to verify).
>>
>
> I give that an test in a bit , the box is running now some test for a
> different EPYC issue :)
>
> ( https://community.amd.com/thread/224000 )
>


With mem_encrypt=off all is working fine from a initrd with SMT ON

Reagrds,

Gabriel C

2018-01-09 23:44:16

by Tom Lendacky

[permalink] [raw]
Subject: Re: AMD EPYC microcode update bug?

On 1/9/2018 5:06 PM, Gabriel C wrote:
> 2018-01-09 23:47 GMT+01:00 Tom Lendacky <[email protected]>:
>> On 1/9/2018 4:28 PM, Gabriel C wrote:
>>> Hello ,
>>>
>>> I'm testing an EPYC system right now with 2 EPYC 7281 16-Core Processors.
>>>
>>> I'm on 4.15.0-rc7 and tested an update to microcode_amd_fam17h.bin.
>>>
>>> First run was made by using the early microcode option with dracut[1]
>>> so loading from a initrd. the driver reported 63 updated CPUs while CPU0
>>> got still old microcode.
>>
>> I'm guessing that memory encryption is enabled, correct? I've submitted a
>> patch series to perform early initrd decryption for just this problem. I'm
>> incorporating some minor feedback and getting ready to submit the next
>> version.
>
> Yes is correct I use mem_encrypt=on and SMT on in BIOS.
>
> Can you give me an link to the patch series ?

Here's the link: https://marc.info/?l=linux-kernel&m=151389377606957&w=2

Thanks,
Tom

>
>>
>> In the meantime, if you specify mem_encrypt=off on the kernel command line
>> it should show CPU0 updated properly (with mem_encrypt=on and SMT enabled,
>> I believe it really does get updated when the sibling hread is updated -
>> do a rdmsr of 0x0000008b to verify).
>>
>
> I give that an test in a bit , the box is running now some test for a
> different EPYC issue :)
>
> ( https://community.amd.com/thread/224000 )
>
> Regards,
>
> Gabriel C
>