AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.
However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.
Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.
However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.
Check for this condition, clear the return value, and set a proper error
code.
Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <[email protected]>
Cc: [email protected]
---
arch/x86/kernel/amd_nb.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 4266b64631a4..33ebf104a020 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -190,7 +190,14 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
int amd_smn_read(u16 node, u32 address, u32 *value)
{
- return __amd_smn_rw(node, address, value, false);
+ int err = __amd_smn_rw(node, address, value, false);
+
+ if (PCI_POSSIBLE_ERROR(*value)) {
+ err = -ENODEV;
+ *value = 0;
+ }
+
+ return err;
}
EXPORT_SYMBOL_GPL(amd_smn_read);
--
2.34.1
On Mon, Apr 03, 2023 at 04:42:44PM +0000, Yazen Ghannam wrote:
> int amd_smn_read(u16 node, u32 address, u32 *value)
> {
> - return __amd_smn_rw(node, address, value, false);
> + int err = __amd_smn_rw(node, address, value, false);
> +
> + if (PCI_POSSIBLE_ERROR(*value)) {
> + err = -ENODEV;
> + *value = 0;
> + }
Why not put this check in __amd_smn_rw()?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/3/23 15:32, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 04:42:44PM +0000, Yazen Ghannam wrote:
>> int amd_smn_read(u16 node, u32 address, u32 *value)
>> {
>> - return __amd_smn_rw(node, address, value, false);
>> + int err = __amd_smn_rw(node, address, value, false);
>> +
>> + if (PCI_POSSIBLE_ERROR(*value)) {
>> + err = -ENODEV;
>> + *value = 0;
>> + }
>
> Why not put this check in __amd_smn_rw()?
>
I don't think pci_write_config*() sets the PCI Error response like
pci_read_config(), AFAICT.
I think to solve the writes-ignored problem, we'd need to do another read and
compare it to what we intended to write. That could go into amd_smn_write(),
if needed. Unless the PCI kernel API has something like this.
Thanks,
Yazen
On Mon, Apr 03, 2023 at 03:40:38PM -0400, Yazen Ghannam wrote:
> I don't think pci_write_config*() sets the PCI Error response like
> pci_read_config(), AFAICT.
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 4266b64631a4..c4caade434a7 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -181,6 +181,13 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
pr_warn("Error %s SMN address 0x%x.\n",
(write ? "writing to" : "reading from"), address);
+ if (!write) {
+ if (PCI_POSSIBLE_ERROR(*value)) {
+ err = -ENODEV;
+ *value = 0;
+ }
+ }
+
out_unlock:
mutex_unlock(&smn_mutex);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/3/23 16:36, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 03:40:38PM -0400, Yazen Ghannam wrote:
>> I don't think pci_write_config*() sets the PCI Error response like
>> pci_read_config(), AFAICT.
>
> diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
> index 4266b64631a4..c4caade434a7 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -181,6 +181,13 @@ static int __amd_smn_rw(u16 node, u32 address, u32 *value, bool write)
> pr_warn("Error %s SMN address 0x%x.\n",
> (write ? "writing to" : "reading from"), address);
>
> + if (!write) {
> + if (PCI_POSSIBLE_ERROR(*value)) {
> + err = -ENODEV;
> + *value = 0;
> + }
> + }
> +
> out_unlock:
> mutex_unlock(&smn_mutex);
>
>
Yes, that's fine. Should I send another revision?
Thanks,
Yazen
On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
> Yes, that's fine. Should I send another revision?
On a second thought, I think we should do what you said in the write
function too. Because the write can fail too. So if it can, we need to
handle that potential error too.
Care to send a new version which does this check in the read and in the
write function? Basically what you had initially but with the write side
check added too to amd_smn_write.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/5/23 14:06, Borislav Petkov wrote:
> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>> Yes, that's fine. Should I send another revision?
>
> On a second thought, I think we should do what you said in the write
> function too. Because the write can fail too. So if it can, we need to
> handle that potential error too.
>
> Care to send a new version which does this check in the read and in the
> write function? Basically what you had initially but with the write side
> check added too to amd_smn_write.
>
Sure thing. I don't have a real test for the write path. But I'll test by
faking it.
Thanks,
Yazen
On 4/5/23 14:32, Yazen Ghannam wrote:
> On 4/5/23 14:06, Borislav Petkov wrote:
>> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>>> Yes, that's fine. Should I send another revision?
>>
>> On a second thought, I think we should do what you said in the write
>> function too. Because the write can fail too. So if it can, we need to
>> handle that potential error too.
>>
>> Care to send a new version which does this check in the read and in the
>> write function? Basically what you had initially but with the write side
>> check added too to amd_smn_write.
>>
>
> Sure thing. I don't have a real test for the write path. But I'll test by
> faking it.
>
So I thought about it for a bit and quickly realized the "write and read back"
method isn't robust when done here.
Possible issues:
1) Bits that are "Write-1-to-clear". In this case, we *don't* want the read to
match the write.
2) Bits that are "Read-as-Zero"/"Writes-Ignored". We can't know this
information here.
3) Bits that are "Reserved / Set to 1". Ditto above.
I think all these issues should be handled by the callers of amd_smn_write().
They should do the "write and read back" check themselves, if needed.
For #1, they can see if their target bits got cleared.
For #2 and #3, they can check if their target bits got set as intended.
This matches what we do for rdmsr/wrmsr. As long as there's no #GP, then we're
good, and the caller does their own checking.
The "PCI Error Response" for the SMN read is the only check that would apply
to *any* SMN read. So I think that makes sense to do here instead of at each
call site.
What do you think?
Thanks,
Yazen
On 4/5/23 3:10 PM, Yazen Ghannam wrote:
> On 4/5/23 14:32, Yazen Ghannam wrote:
>> On 4/5/23 14:06, Borislav Petkov wrote:
>>> On Mon, Apr 03, 2023 at 05:36:40PM -0400, Yazen Ghannam wrote:
>>>> Yes, that's fine. Should I send another revision?
>>>
>>> On a second thought, I think we should do what you said in the write
>>> function too. Because the write can fail too. So if it can, we need to
>>> handle that potential error too.
>>>
>>> Care to send a new version which does this check in the read and in the
>>> write function? Basically what you had initially but with the write side
>>> check added too to amd_smn_write.
>>>
>>
>> Sure thing. I don't have a real test for the write path. But I'll test by
>> faking it.
>>
>
> So I thought about it for a bit and quickly realized the "write and read back"
> method isn't robust when done here.
>
> Possible issues:
> 1) Bits that are "Write-1-to-clear". In this case, we *don't* want the read to
> match the write.
> 2) Bits that are "Read-as-Zero"/"Writes-Ignored". We can't know this
> information here.
> 3) Bits that are "Reserved / Set to 1". Ditto above.
>
> I think all these issues should be handled by the callers of amd_smn_write().
> They should do the "write and read back" check themselves, if needed.
>
> For #1, they can see if their target bits got cleared.
>
> For #2 and #3, they can check if their target bits got set as intended.
>
> This matches what we do for rdmsr/wrmsr. As long as there's no #GP, then we're
> good, and the caller does their own checking.
>
> The "PCI Error Response" for the SMN read is the only check that would apply
> to *any* SMN read. So I think that makes sense to do here instead of at each
> call site.
>
> What do you think?
>
Hi Boris,
Just following up. What do you think about the points above? I can send
another revision for whichever way we need to go.
Thanks,
Yazen
On Wed, Apr 05, 2023 at 03:10:09PM -0400, Yazen Ghannam wrote:
> What do you think?
Yes, please put that as a comment over __amd_smn_rw() as to why callers
should check the retval *and* make both the read and the write
__must_check and get rid of
if (err)
pr_warn("Error %s SMN address 0x%x.\n",
(write ? "writing to" : "reading from"), address);
which won't be seen in all cases. The __must_check will force the
callers to do the proper checking which is the only sane thing to do
with such a variety of bit behaviors.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 5/10/23 7:35 AM, Borislav Petkov wrote:
> On Wed, Apr 05, 2023 at 03:10:09PM -0400, Yazen Ghannam wrote:
>> What do you think?
>
> Yes, please put that as a comment over __amd_smn_rw() as to why callers
> should check the retval *and* make both the read and the write
> __must_check and get rid of
>
> if (err)
> pr_warn("Error %s SMN address 0x%x.\n",
> (write ? "writing to" : "reading from"), address);
>
> which won't be seen in all cases. The __must_check will force the
> callers to do the proper checking which is the only sane thing to do
> with such a variety of bit behaviors.
>
Understood, will do.
Thanks,
Yazen
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: c625dabbf1c4a8e77e4734014f2fde7aa9071a1f
Gitweb: https://git.kernel.org/tip/c625dabbf1c4a8e77e4734014f2fde7aa9071a1f
Author: Yazen Ghannam <[email protected]>
AuthorDate: Mon, 03 Apr 2023 16:42:44
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Wed, 05 Jun 2024 21:23:34 +02:00
x86/amd_nb: Check for invalid SMN reads
AMD Zen-based systems use a System Management Network (SMN) that
provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI
config space. The PCI config access may fail and return an error code.
This would prevent the "read" value from being updated.
However, the PCI config access may succeed, but the return value may be
invalid. This is in similar fashion to PCI bad reads, i.e. return all
bits set.
Most systems will return 0 for SMN addresses that are not accessible.
This is in line with AMD convention that unavailable registers are
Read-as-Zero/Writes-Ignored.
However, some systems will return a "PCI Error Response" instead. This
value, along with an error code of 0 from the PCI config access, will
confuse callers of the amd_smn_read() function.
Check for this condition, clear the return value, and set a proper error
code.
Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h")
Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/amd_nb.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c
index 3cf156f..027a8c7 100644
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -215,7 +215,14 @@ out:
int amd_smn_read(u16 node, u32 address, u32 *value)
{
- return __amd_smn_rw(node, address, value, false);
+ int err = __amd_smn_rw(node, address, value, false);
+
+ if (PCI_POSSIBLE_ERROR(*value)) {
+ err = -ENODEV;
+ *value = 0;
+ }
+
+ return err;
}
EXPORT_SYMBOL_GPL(amd_smn_read);