2023-02-02 10:28:28

by Thorsten Leemhuis

[permalink] [raw]
Subject: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

Hi, this is your Linux kernel regression tracker.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216989 :

> [email protected] 2023-02-02 02:49:48 UTC
>
> Linux kernel >=6.1 exhibits a stuttering issue that occurs once every few hours. See https://www.reddit.com/r/archlinux/comments/zvgev0/audio_stuttering_issues_with_kernel_611/ https://www.reddit.com/r/linux_gaming/comments/zzqaf7/having_intermittent_stutters_with_a_ryzen_cpu/ https://bbs.archlinux.org/viewtopic.php?id=282333 for detailed information.
>
> The stutter lasts for 1-2 seconds and causes the framerate of the display to decrease dramatically and causes bursts in audio output.
>
> Additional info:
>
> * linux 6.1.0 or later
>
> Steps to reproduce:
>
> * Use Linux kernel >=6.1
>
> * Use AMD Ryzen CPU with fTPM enabled
>
> * Wait for a few hours
>
> [reply] [−] Comment 1 Bell 2023-02-02 03:33:24 UTC
>
> Hey, Let me add some extra information to help.
> 1. this issue can happen in 6.2-rc6 without loading third-party kernel modules. (NVIDIA or Virtualbox and so)
> 2. some guy on the Desktop/Laptop who can disable ftpm and did eliminate the problem.
> 3. this problem can happen in newer AMD processors from the 4000 series to the 6000 series.
> 4. this problem isn't caused by the dedicated graphics card I guess, here are some combinations that stuttering can happen:
> AMD(built-in GPU) + NVIDIA Laptop
> AMD(No built-in GPU) + AMD(dedicated) Desktop
> AMD(built-in GPU) + AMD(dedicated) Laptop/Desktop
> AMD + AMD(Built-in GPU only) Laptop
> all suffer from this.
>
> Hope this can help :)

See the ticket for more details.

I briefly looked into the links and found this:
https://www.amd.com/en/support/kb/faq/pa-410

>
> Intermittent System Stutter Experienced with fTPM Enabled on Windows® 10
> and 11
> Article Number
> PA-410
>
> This documentation provides information on improving intermittent
> performance stutter(s) on select PCs running Windows® 10 and 11 with
> Firmware Trusted Platform Module (“fTPM”) enabled.
>
>
>
> Issue Description
>
> AMD has determined that select AMD Ryzen™ system configurations may
> intermittently perform extended fTPM-related memory transactions in SPI
> flash memory (“SPIROM”) located on the motherboard, which can lead to
> temporary pauses in system interactivity or responsiveness until the
> transaction is concluded.
>
>
>
> Update and Workaround
>
> Update: Affected PCs will require a motherboard system BIOS (sBIOS)
> update containing enhanced modules for fTPM interaction with SPIROM. AMD
> expects that flashable customer sBIOS files to be available starting in
> early May, 2022. Exact BIOS availability timing for a specific
> motherboard depends on the testing and integration schedule of your
> manufacturer. Flashable updates for motherboards will be based on AMD
> AGESA 1207 (or newer).
>
> Workaround: As an immediate solution, affected customers dependent
> on fTPM functionality for Trusted Platform Module support may instead
> use a hardware TPM (“dTPM”) device for trusted computing. Platform dTPM
> modules utilize onboard non-volatile memory (NVRAM) that supersedes the
> TPM/SPIROM interaction described in this article.
>
> COMPATIBILITY: Please check with your system or motherboard
> manufacturer to ensure that your platform supports add-in dTPM modules
> before attempting or implementing this workaround.
>
> WARNING: If switching an active system from fTPM to dTPM, it is
> critical that you disable TPM-backed encryption systems (e.g. BitLocker
> Drive Encryption) and/or back up vital system data prior to switching
> TPM devices. You must have full administrative access to the system, or
> explicit support from your IT administrator if the system is managed.
> For more information on transferring ownership to a new TPM device,
> please visit this Microsoft webpage.
>

So it's a firmware problem, but apparently one that Linux only triggers
since 6.1.

Jason, could the hwrng changes have anything to do with this?

A bisection really would be helpful, but I guess that is not easy as the
problem apparently only shows up after some time...


Anyway:

[TLDR for the rest of this mail: I'm adding this report to the list of
tracked Linux kernel regressions; the text you find below is based on a
few templates paragraphs you might have encountered already in similar
form.]

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: v6.0..v6.1
https://bugzilla.kernel.org/show_bug.cgi?id=216989
#regzbot title: tpm: systems with AMD Ryzen stutter when fTPM is enabled
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
this thread sees some discussion). See page linked in footer for details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


2023-02-02 12:57:55

by James Bottomley

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Thu, 2023-02-02 at 11:28 +0100, Linux kernel regression tracking
(Thorsten Leemhuis) wrote:
[...]
> So it's a firmware problem, but apparently one that Linux only
> triggers since 6.1.
>
> Jason, could the hwrng changes have anything to do with this?
>
> A bisection really would be helpful, but I guess that is not easy as
> the problem apparently only shows up after some time...

the problem description says the fTPM causes system stutter when it
writes to NVRAM. Since an fTPM is a proprietary implementation, we
don't know what it does. The ms TPM implementation definitely doesn't
trigger NV writes on rng requests, but it is plausible this fTPM does
... particularly if they have a time based input to the DRNG. Even if
this speculation is true, there's not much we can do about it, since
it's a firmware bug and AMD should have delivered the BIOS update that
fixes it.

The way to test this would be to set the config option

CONFIG_HW_RANDOM_TPM=n

and see if the stutter goes away. I suppose if someone could quantify
the bad bioses, we could warn, but that's about it.

James


2023-02-05 17:38:31

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

[ccing Dominik (who authored the culprit) and Herbert (who merged it)]

On 02.02.23 11:28, Linux kernel regression tracking (Thorsten Leemhuis)
wrote:
>
> I noticed a regression report in bugzilla.kernel.org. As many (most?)
> kernel developer don't keep an eye on it, I decided to forward it by
> mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216989 :

Turns out according to a bisection from one of the reporters that
b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted
sources") (merged for 6.1) apparently makes this hardware issue occur
quicker/more frequently a lot quicker on any board that didn't get the
firmware update yet. So it could be argued that from the point of the
kernel it *might* be considered a regression.

For details see the ticket.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot introduced: b006c439d58d

>> [email protected] 2023-02-02 02:49:48 UTC
>>
>> Linux kernel >=6.1 exhibits a stuttering issue that occurs once every few hours. See https://www.reddit.com/r/archlinux/comments/zvgev0/audio_stuttering_issues_with_kernel_611/ https://www.reddit.com/r/linux_gaming/comments/zzqaf7/having_intermittent_stutters_with_a_ryzen_cpu/ https://bbs.archlinux.org/viewtopic.php?id=282333 for detailed information.
>>
>> The stutter lasts for 1-2 seconds and causes the framerate of the display to decrease dramatically and causes bursts in audio output.
>>
>> Additional info:
>>
>> * linux 6.1.0 or later
>>
>> Steps to reproduce:
>>
>> * Use Linux kernel >=6.1
>>
>> * Use AMD Ryzen CPU with fTPM enabled
>>
>> * Wait for a few hours
>>
>> [reply] [−] Comment 1 Bell 2023-02-02 03:33:24 UTC
>>
>> Hey, Let me add some extra information to help.
>> 1. this issue can happen in 6.2-rc6 without loading third-party kernel modules. (NVIDIA or Virtualbox and so)
>> 2. some guy on the Desktop/Laptop who can disable ftpm and did eliminate the problem.
>> 3. this problem can happen in newer AMD processors from the 4000 series to the 6000 series.
>> 4. this problem isn't caused by the dedicated graphics card I guess, here are some combinations that stuttering can happen:
>> AMD(built-in GPU) + NVIDIA Laptop
>> AMD(No built-in GPU) + AMD(dedicated) Desktop
>> AMD(built-in GPU) + AMD(dedicated) Laptop/Desktop
>> AMD + AMD(Built-in GPU only) Laptop
>> all suffer from this.
>>
>> Hope this can help :)
>
> See the ticket for more details.
>
> I briefly looked into the links and found this:
> https://www.amd.com/en/support/kb/faq/pa-410
>
>>
>> Intermittent System Stutter Experienced with fTPM Enabled on Windows® 10
>> and 11
>> Article Number
>> PA-410
>>
>> This documentation provides information on improving intermittent
>> performance stutter(s) on select PCs running Windows® 10 and 11 with
>> Firmware Trusted Platform Module (“fTPM”) enabled.
>>
>>
>>
>> Issue Description
>>
>> AMD has determined that select AMD Ryzen™ system configurations may
>> intermittently perform extended fTPM-related memory transactions in SPI
>> flash memory (“SPIROM”) located on the motherboard, which can lead to
>> temporary pauses in system interactivity or responsiveness until the
>> transaction is concluded.
>>
>>
>>
>> Update and Workaround
>>
>> Update: Affected PCs will require a motherboard system BIOS (sBIOS)
>> update containing enhanced modules for fTPM interaction with SPIROM. AMD
>> expects that flashable customer sBIOS files to be available starting in
>> early May, 2022. Exact BIOS availability timing for a specific
>> motherboard depends on the testing and integration schedule of your
>> manufacturer. Flashable updates for motherboards will be based on AMD
>> AGESA 1207 (or newer).
>>
>> Workaround: As an immediate solution, affected customers dependent
>> on fTPM functionality for Trusted Platform Module support may instead
>> use a hardware TPM (“dTPM”) device for trusted computing. Platform dTPM
>> modules utilize onboard non-volatile memory (NVRAM) that supersedes the
>> TPM/SPIROM interaction described in this article.
>>
>> COMPATIBILITY: Please check with your system or motherboard
>> manufacturer to ensure that your platform supports add-in dTPM modules
>> before attempting or implementing this workaround.
>>
>> WARNING: If switching an active system from fTPM to dTPM, it is
>> critical that you disable TPM-backed encryption systems (e.g. BitLocker
>> Drive Encryption) and/or back up vital system data prior to switching
>> TPM devices. You must have full administrative access to the system, or
>> explicit support from your IT administrator if the system is managed.
>> For more information on transferring ownership to a new TPM device,
>> please visit this Microsoft webpage.
>>
>
> So it's a firmware problem, but apparently one that Linux only triggers
> since 6.1.
>
> Jason, could the hwrng changes have anything to do with this?
>
> A bisection really would be helpful, but I guess that is not easy as the
> problem apparently only shows up after some time...
>
>
> Anyway:
>
> [TLDR for the rest of this mail: I'm adding this report to the list of
> tracked Linux kernel regressions; the text you find below is based on a
> few templates paragraphs you might have encountered already in similar
> form.]
>
> BTW, let me use this mail to also add the report to the list of tracked
> regressions to ensure it's doesn't fall through the cracks:
>
> #regzbot introduced: v6.0..v6.1
> https://bugzilla.kernel.org/show_bug.cgi?id=216989
> #regzbot title: tpm: systems with AMD Ryzen stutter when fTPM is enabled
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
> this thread sees some discussion). See page linked in footer for details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.

2023-02-07 22:14:01

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Sun, Feb 05, 2023 at 06:38:16PM +0100, Linux kernel regression tracking (Thorsten Leemhuis) wrote:
> [ccing Dominik (who authored the culprit) and Herbert (who merged it)]
>
> On 02.02.23 11:28, Linux kernel regression tracking (Thorsten Leemhuis)
> wrote:
> >
> > I noticed a regression report in bugzilla.kernel.org. As many (most?)
> > kernel developer don't keep an eye on it, I decided to forward it by
> > mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216989 :
>
> Turns out according to a bisection from one of the reporters that
> b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted
> sources") (merged for 6.1) apparently makes this hardware issue occur
> quicker/more frequently a lot quicker on any board that didn't get the
> firmware update yet. So it could be argued that from the point of the
> kernel it *might* be considered a regression.

Finally replying without HTML, now that I'm at my laptop.

This isn't a bug with the commit you mentioned. Rather, this is a bug in
the TPM hardware and/or in the TPM driver. Probably the TPM driver
should quirk around the faulty BIOS to disable whatever functionality is
broken, or it should notice these latency spikes and warn, or something.
But I'll leave that to James; that's his wheelhouse.

Jason

2023-02-08 02:13:23

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Thu, Feb 02, 2023 at 07:57:37AM -0500, James Bottomley wrote:
> On Thu, 2023-02-02 at 11:28 +0100, Linux kernel regression tracking
> (Thorsten Leemhuis) wrote:
> [...]
> > So it's a firmware problem, but apparently one that Linux only
> > triggers since 6.1.
> >
> > Jason, could the hwrng changes have anything to do with this?
> >
> > A bisection really would be helpful, but I guess that is not easy as
> > the problem apparently only shows up after some time...
>
> the problem description says the fTPM causes system stutter when it
> writes to NVRAM. Since an fTPM is a proprietary implementation, we
> don't know what it does. The ms TPM implementation definitely doesn't
> trigger NV writes on rng requests, but it is plausible this fTPM does
> ... particularly if they have a time based input to the DRNG. Even if
> this speculation is true, there's not much we can do about it, since
> it's a firmware bug and AMD should have delivered the BIOS update that
> fixes it.
>
> The way to test this would be to set the config option
>
> CONFIG_HW_RANDOM_TPM=n
>
> and see if the stutter goes away. I suppose if someone could quantify
> the bad bioses, we could warn, but that's about it.
>
> James
>

And e.g. I do not have a Ryzen CPU so pretty hard to answer such question.

BR, Jarkko

2023-02-08 02:14:20

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Wed, Feb 08, 2023 at 04:13:16AM +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 02, 2023 at 07:57:37AM -0500, James Bottomley wrote:
> > On Thu, 2023-02-02 at 11:28 +0100, Linux kernel regression tracking
> > (Thorsten Leemhuis) wrote:
> > [...]
> > > So it's a firmware problem, but apparently one that Linux only
> > > triggers since 6.1.
> > >
> > > Jason, could the hwrng changes have anything to do with this?
> > >
> > > A bisection really would be helpful, but I guess that is not easy as
> > > the problem apparently only shows up after some time...
> >
> > the problem description says the fTPM causes system stutter when it
> > writes to NVRAM. Since an fTPM is a proprietary implementation, we
> > don't know what it does. The ms TPM implementation definitely doesn't
> > trigger NV writes on rng requests, but it is plausible this fTPM does
> > ... particularly if they have a time based input to the DRNG. Even if
> > this speculation is true, there's not much we can do about it, since
> > it's a firmware bug and AMD should have delivered the BIOS update that
> > fixes it.
> >
> > The way to test this would be to set the config option
> >
> > CONFIG_HW_RANDOM_TPM=n
> >
> > and see if the stutter goes away. I suppose if someone could quantify
> > the bad bioses, we could warn, but that's about it.
> >
> > James
> >
>
> And e.g. I do not have a Ryzen CPU so pretty hard to answer such question.

... about hwrng

BR, Jarkko

2023-02-08 02:32:04

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Tue, Feb 7, 2023 at 11:13 PM Jarkko Sakkinen <[email protected]> wrote:
>
> On Wed, Feb 08, 2023 at 04:13:16AM +0200, Jarkko Sakkinen wrote:
> > On Thu, Feb 02, 2023 at 07:57:37AM -0500, James Bottomley wrote:
> > > On Thu, 2023-02-02 at 11:28 +0100, Linux kernel regression tracking
> > > (Thorsten Leemhuis) wrote:
> > > [...]
> > > > So it's a firmware problem, but apparently one that Linux only
> > > > triggers since 6.1.
> > > >
> > > > Jason, could the hwrng changes have anything to do with this?
> > > >
> > > > A bisection really would be helpful, but I guess that is not easy as
> > > > the problem apparently only shows up after some time...
> > >
> > > the problem description says the fTPM causes system stutter when it
> > > writes to NVRAM. Since an fTPM is a proprietary implementation, we
> > > don't know what it does. The ms TPM implementation definitely doesn't
> > > trigger NV writes on rng requests, but it is plausible this fTPM does
> > > ... particularly if they have a time based input to the DRNG. Even if
> > > this speculation is true, there's not much we can do about it, since
> > > it's a firmware bug and AMD should have delivered the BIOS update that
> > > fixes it.
> > >
> > > The way to test this would be to set the config option
> > >
> > > CONFIG_HW_RANDOM_TPM=n
> > >
> > > and see if the stutter goes away. I suppose if someone could quantify
> > > the bad bioses, we could warn, but that's about it.
> > >
> > > James
> > >
> >
> > And e.g. I do not have a Ryzen CPU so pretty hard to answer such question.
>
> ... about hwrng

Well, the options here are basically:

a) Do nothing, and just expect people to update their BIOSes, since an
update is available.
b) Do nothing, and expect people with broken BIOSes to `echo blacklist
tpm >> /etc/modprobesomethingsomething`.
c) Figure out how to identify the buggy BIOS and disable the TPM's rng
with a quirk in this case.
d) Figure out how to dynamically detect TPM rng latency, and warn about it.
e) Figure out how to dynamically detect TPM rng latency, and disable it.

I think given that a firmware update *is* available, (a) is fine. And
the generic workaround remains (b). But if you want to be really nice,
(c) would be fine too. Somebody with the affected hardware would
probably have to send in some DMI logs or whatever else. (d) and (e)
sound possible in theory but I dunno really... seems finicky.

Jason

2023-02-08 02:52:16

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [regression] Bug 216989 - since 6.1 systems with AMD Ryzen stutter when fTPM is enabled

On Tue, Feb 07, 2023 at 11:31:37PM -0300, Jason A. Donenfeld wrote:
> On Tue, Feb 7, 2023 at 11:13 PM Jarkko Sakkinen <[email protected]> wrote:
> >
> > On Wed, Feb 08, 2023 at 04:13:16AM +0200, Jarkko Sakkinen wrote:
> > > On Thu, Feb 02, 2023 at 07:57:37AM -0500, James Bottomley wrote:
> > > > On Thu, 2023-02-02 at 11:28 +0100, Linux kernel regression tracking
> > > > (Thorsten Leemhuis) wrote:
> > > > [...]
> > > > > So it's a firmware problem, but apparently one that Linux only
> > > > > triggers since 6.1.
> > > > >
> > > > > Jason, could the hwrng changes have anything to do with this?
> > > > >
> > > > > A bisection really would be helpful, but I guess that is not easy as
> > > > > the problem apparently only shows up after some time...
> > > >
> > > > the problem description says the fTPM causes system stutter when it
> > > > writes to NVRAM. Since an fTPM is a proprietary implementation, we
> > > > don't know what it does. The ms TPM implementation definitely doesn't
> > > > trigger NV writes on rng requests, but it is plausible this fTPM does
> > > > ... particularly if they have a time based input to the DRNG. Even if
> > > > this speculation is true, there's not much we can do about it, since
> > > > it's a firmware bug and AMD should have delivered the BIOS update that
> > > > fixes it.
> > > >
> > > > The way to test this would be to set the config option
> > > >
> > > > CONFIG_HW_RANDOM_TPM=n
> > > >
> > > > and see if the stutter goes away. I suppose if someone could quantify
> > > > the bad bioses, we could warn, but that's about it.
> > > >
> > > > James
> > > >
> > >
> > > And e.g. I do not have a Ryzen CPU so pretty hard to answer such question.
> >
> > ... about hwrng
>
> Well, the options here are basically:
>
> a) Do nothing, and just expect people to update their BIOSes, since an
> update is available.
> b) Do nothing, and expect people with broken BIOSes to `echo blacklist
> tpm >> /etc/modprobesomethingsomething`.
> c) Figure out how to identify the buggy BIOS and disable the TPM's rng
> with a quirk in this case.
> d) Figure out how to dynamically detect TPM rng latency, and warn about it.
> e) Figure out how to dynamically detect TPM rng latency, and disable it.
>
> I think given that a firmware update *is* available, (a) is fine. And
> the generic workaround remains (b). But if you want to be really nice,
> (c) would be fine too. Somebody with the affected hardware would
> probably have to send in some DMI logs or whatever else. (d) and (e)
> sound possible in theory but I dunno really... seems finicky.
>
> Jason

For now (a), but if someone with capable hardware can make up something
I'm happy to go that through, if it makes sense.

BR, Jarkko