2023-03-14 08:55:14

by Huacai Chen

[permalink] [raw]
Subject: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

LoongArch maintains cache coherency in hardware, but when works with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, may be fixed in newer
chipsets).

This means WUC can only used for write-only memory regions now, so this
option is disabled by default (means WUC falls back to SUC for ioremap).
You can enable this option if the kernel is ensured to run on bug-free
hardwares.

Suggested-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
---
arch/loongarch/Kconfig | 14 ++++++++++++++
arch/loongarch/include/asm/io.h | 5 +++++
2 files changed, 19 insertions(+)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 0d11738a861a..e3f5c422636f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -446,6 +446,20 @@ config ARCH_IOREMAP
protection support. However, you can enable LoongArch DMW-based
ioremap() for better performance.

+config ARCH_WRITECOMBINE
+ bool "Enable WriteCombine (WUC) for ioremap()"
+ help
+ LoongArch maintains cache coherency in hardware, but with LS7A
+ chipsets the WUC attribute (Weak-ordered UnCached, which is similar
+ to WriteCombine) is out of the scope of cache coherency machanism
+ for PCIe devices (this is a PCIe protocol violation, may be fixed
+ in newer chipsets).
+
+ This means WUC can only used for write-only memory regions now, so
+ this option is disabled by default (means WUC falls back to SUC for
+ ioremap). You can enable this option if the kernel is ensured to run
+ on bug-free hardwares.
+
config ARCH_STRICT_ALIGN
bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
default y
diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
index 402a7d9e3a53..079ef897ed1a 100644
--- a/arch/loongarch/include/asm/io.h
+++ b/arch/loongarch/include/asm/io.h
@@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
* @offset: bus address of the memory
* @size: size of the resource to map
*/
+#ifdef CONFIG_ARCH_WRITECOMBINE
#define ioremap_wc(offset, size) \
ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
+#else
+#define ioremap_wc(offset, size) \
+ ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
+#endif

#define ioremap_cache(offset, size) \
ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
--
2.39.1



2023-03-14 09:41:14

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> LoongArch maintains cache coherency in hardware, but when works with
> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> to WriteCombine) is out of the scope of cache coherency machanism for
> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> chipsets).

IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?

> This means WUC can only used for write-only memory regions now, so this
> option is disabled by default (means WUC falls back to SUC for ioremap).
> You can enable this option if the kernel is ensured to run on bug-free
> hardwares.

Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
automatically from the vendor:device ID of the PCI root controller?
Then we don't need to rely on the user or distro maintainer to select an
option. I see there is already many architecture-dependant #if
directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
in PCI tree...

If a PCI quirk is not possible, then is it possible to make a kernel
command line option, leaving this CONFIG as the default value of the
option? I guess in the future many LoongArch users will just install a
binary distro, then it would be much easier to edit grub.cfg than
rebuilding the kernel when they finally buy a compliant PCIe controller.

> Suggested-by: WANG Xuerui <[email protected]>
> Signed-off-by: Huacai Chen <[email protected]>
> ---
>  arch/loongarch/Kconfig          | 14 ++++++++++++++
>  arch/loongarch/include/asm/io.h |  5 +++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 0d11738a861a..e3f5c422636f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -446,6 +446,20 @@ config ARCH_IOREMAP
>           protection support. However, you can enable LoongArch DMW-based
>           ioremap() for better performance.
>  
> +config ARCH_WRITECOMBINE
> +       bool "Enable WriteCombine (WUC) for ioremap()"
> +       help
> +         LoongArch maintains cache coherency in hardware, but with LS7A
> +         chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> +         to WriteCombine) is out of the scope of cache coherency machanism
> +         for PCIe devices (this is a PCIe protocol violation, may be fixed
> +         in newer chipsets).
> +
> +         This means WUC can only used for write-only memory regions now, so
> +         this option is disabled by default (means WUC falls back to SUC for
> +         ioremap). You can enable this option if the kernel is ensured to run
> +         on bug-free hardwares.
> +
>  config ARCH_STRICT_ALIGN
>         bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
>         default y
> diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> index 402a7d9e3a53..079ef897ed1a 100644
> --- a/arch/loongarch/include/asm/io.h
> +++ b/arch/loongarch/include/asm/io.h
> @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
>   * @offset:    bus address of the memory
>   * @size:      size of the resource to map
>   */
> +#ifdef CONFIG_ARCH_WRITECOMBINE
>  #define ioremap_wc(offset, size)       \
>         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> +#else
> +#define ioremap_wc(offset, size)       \
> +       ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> +#endif
>  
>  #define ioremap_cache(offset, size)    \
>         ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2023-03-14 10:02:40

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

Hi, Ruoyao,

On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <[email protected]> wrote:
>
> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> > LoongArch maintains cache coherency in hardware, but when works with
> > LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> > to WriteCombine) is out of the scope of cache coherency machanism for
> > PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> > chipsets).
>
> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
Yes, very unfortunately, but this issue is only observed in the amdgpu
driver now.

>
> > This means WUC can only used for write-only memory regions now, so this
> > option is disabled by default (means WUC falls back to SUC for ioremap).
> > You can enable this option if the kernel is ensured to run on bug-free
> > hardwares.
>
> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> automatically from the vendor:device ID of the PCI root controller?
> Then we don't need to rely on the user or distro maintainer to select an
> option. I see there is already many architecture-dependant #if
> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> in PCI tree...
Not a good idea, pci quirks need too long a time to review, and we
don't know when this issue can be fixed in hardware.

>
> If a PCI quirk is not possible, then is it possible to make a kernel
> command line option, leaving this CONFIG as the default value of the
> option? I guess in the future many LoongArch users will just install a
> binary distro, then it would be much easier to edit grub.cfg than
> rebuilding the kernel when they finally buy a compliant PCIe controller.
If we use command line parameter, we can remove this Kconfig option.

Huacai
>
> > Suggested-by: WANG Xuerui <[email protected]>
> > Signed-off-by: Huacai Chen <[email protected]>
> > ---
> > arch/loongarch/Kconfig | 14 ++++++++++++++
> > arch/loongarch/include/asm/io.h | 5 +++++
> > 2 files changed, 19 insertions(+)
> >
> > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> > index 0d11738a861a..e3f5c422636f 100644
> > --- a/arch/loongarch/Kconfig
> > +++ b/arch/loongarch/Kconfig
> > @@ -446,6 +446,20 @@ config ARCH_IOREMAP
> > protection support. However, you can enable LoongArch DMW-based
> > ioremap() for better performance.
> >
> > +config ARCH_WRITECOMBINE
> > + bool "Enable WriteCombine (WUC) for ioremap()"
> > + help
> > + LoongArch maintains cache coherency in hardware, but with LS7A
> > + chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> > + to WriteCombine) is out of the scope of cache coherency machanism
> > + for PCIe devices (this is a PCIe protocol violation, may be fixed
> > + in newer chipsets).
> > +
> > + This means WUC can only used for write-only memory regions now, so
> > + this option is disabled by default (means WUC falls back to SUC for
> > + ioremap). You can enable this option if the kernel is ensured to run
> > + on bug-free hardwares.
> > +
> > config ARCH_STRICT_ALIGN
> > bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
> > default y
> > diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> > index 402a7d9e3a53..079ef897ed1a 100644
> > --- a/arch/loongarch/include/asm/io.h
> > +++ b/arch/loongarch/include/asm/io.h
> > @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
> > * @offset: bus address of the memory
> > * @size: size of the resource to map
> > */
> > +#ifdef CONFIG_ARCH_WRITECOMBINE
> > #define ioremap_wc(offset, size) \
> > ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> > +#else
> > +#define ioremap_wc(offset, size) \
> > + ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> > +#endif
> >
> > #define ioremap_cache(offset, size) \
> > ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))
>
> --
> Xi Ruoyao <[email protected]>
> School of Aerospace Science and Technology, Xidian University

2023-03-14 10:08:49

by WANG Xuerui

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

On 2023/3/14 16:54, Huacai Chen wrote:
> LoongArch maintains cache coherency in hardware, but when works with

but when paired with current LS7A chipsets

> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> to WriteCombine) is out of the scope of cache coherency machanism for
> PCIe devices (this is a PCIe protocol violation, may be fixed in newer

which may be fixed

> chipsets).
>
> This means WUC can only used for write-only memory regions now, so this
> option is disabled by default (means WUC falls back to SUC for ioremap).

by default, making ioremap_wc silently fallback to SUC.

> You can enable this option if the kernel is ensured to run on bug-free
> hardwares.

if you can ensure the kernel will always be running on hardware without
this bug

>
> Suggested-by: WANG Xuerui <[email protected]>
> Signed-off-by: Huacai Chen <[email protected]>
> ---
> arch/loongarch/Kconfig | 14 ++++++++++++++
> arch/loongarch/include/asm/io.h | 5 +++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 0d11738a861a..e3f5c422636f 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -446,6 +446,20 @@ config ARCH_IOREMAP
> protection support. However, you can enable LoongArch DMW-based
> ioremap() for better performance.
>
> +config ARCH_WRITECOMBINE
> + bool "Enable WriteCombine (WUC) for ioremap()"
> + help
> + LoongArch maintains cache coherency in hardware, but with LS7A
> + chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> + to WriteCombine) is out of the scope of cache coherency machanism
> + for PCIe devices (this is a PCIe protocol violation, may be fixed
> + in newer chipsets).
> +
> + This means WUC can only used for write-only memory regions now, so
> + this option is disabled by default (means WUC falls back to SUC for
> + ioremap). You can enable this option if the kernel is ensured to run
> + on bug-free hardwares.

Fix text here like with the commit message.

Then add a "If unsure, say N" line to serve as a warning?

> +
> config ARCH_STRICT_ALIGN
> bool "Enable -mstrict-align to prevent unaligned accesses" if EXPERT
> default y
> diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h
> index 402a7d9e3a53..079ef897ed1a 100644
> --- a/arch/loongarch/include/asm/io.h
> +++ b/arch/loongarch/include/asm/io.h
> @@ -54,8 +54,13 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size,
> * @offset: bus address of the memory
> * @size: size of the resource to map
> */
> +#ifdef CONFIG_ARCH_WRITECOMBINE
> #define ioremap_wc(offset, size) \
> ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_WUC))
> +#else
> +#define ioremap_wc(offset, size) \
> + ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL_SUC))
> +#endif
>
> #define ioremap_cache(offset, size) \
> ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL))

I'll test this later tonight with my RX 6400. See you in a few hours...

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/


2023-03-14 10:12:02

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

On Tue, 2023-03-14 at 18:02 +0800, Huacai Chen wrote:
> > Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> > automatically from the vendor:device ID of the PCI root controller?
> > Then we don't need to rely on the user or distro maintainer to select an
> > option.  I see there is already many architecture-dependant #if
> > directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> > in PCI tree...
> Not a good idea, pci quirks need too long a time to review, and we
> don't know when this issue can be fixed in hardware.

How about make a PCI fixup in arch/loongarch/pci/pci.c then? I think we
just need to set a flag in the fixup, then check the flag in
ioremap_wc...

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2023-03-14 10:22:30

by WANG Xuerui

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

On 2023/3/14 18:02, Huacai Chen wrote:
> Hi, Ruoyao,
>
> On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <[email protected]> wrote:
>>
>> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
>>> LoongArch maintains cache coherency in hardware, but when works with
>>> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
>>> to WriteCombine) is out of the scope of cache coherency machanism for
>>> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
>>> chipsets).
>>
>> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
> Yes, very unfortunately, but this issue is only observed in the amdgpu
> driver now.

It's PCIe protocol violation after all, and we can never be sure about
the vast amount of hardware untested on loongarch after all. Miserable
as the performance hit may get, we don't really have another choice,
unfortunately. Someone needs to lecture the LS7A team real hard!

>>
>>> This means WUC can only used for write-only memory regions now, so this
>>> option is disabled by default (means WUC falls back to SUC for ioremap).
>>> You can enable this option if the kernel is ensured to run on bug-free
>>> hardwares.
>>
>> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
>> automatically from the vendor:device ID of the PCI root controller?
>> Then we don't need to rely on the user or distro maintainer to select an
>> option. I see there is already many architecture-dependant #if
>> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
>> in PCI tree...
> Not a good idea, pci quirks need too long a time to review, and we
> don't know when this issue can be fixed in hardware.
>
>>
>> If a PCI quirk is not possible, then is it possible to make a kernel
>> command line option, leaving this CONFIG as the default value of the
>> option? I guess in the future many LoongArch users will just install a
>> binary distro, then it would be much easier to edit grub.cfg than
>> rebuilding the kernel when they finally buy a compliant PCIe controller.
> If we use command line parameter, we can remove this Kconfig option.

An option is still useful as specifying the compile-time default for
such a kernel parameter, IMO.

--
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/


2023-03-14 11:10:19

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Make WriteCombine configurable for ioremap()

On Tue, Mar 14, 2023 at 6:12 PM WANG Xuerui <[email protected]> wrote:
>
> On 2023/3/14 18:02, Huacai Chen wrote:
> > Hi, Ruoyao,
> >
> > On Tue, Mar 14, 2023 at 5:41 PM Xi Ruoyao <[email protected]> wrote:
> >>
> >> On Tue, 2023-03-14 at 16:54 +0800, Huacai Chen wrote:
> >>> LoongArch maintains cache coherency in hardware, but when works with
> >>> LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
> >>> to WriteCombine) is out of the scope of cache coherency machanism for
> >>> PCIe devices (this is a PCIe protocol violation, may be fixed in newer
> >>> chipsets).
> >>
> >> IIUC all launched LS7A models (7A1000 and 7A2000) suffers this issue?
> > Yes, very unfortunately, but this issue is only observed in the amdgpu
> > driver now.
>
> It's PCIe protocol violation after all, and we can never be sure about
> the vast amount of hardware untested on loongarch after all. Miserable
> as the performance hit may get, we don't really have another choice,
> unfortunately. Someone needs to lecture the LS7A team real hard!
>
> >>
> >>> This means WUC can only used for write-only memory regions now, so this
> >>> option is disabled by default (means WUC falls back to SUC for ioremap).
> >>> You can enable this option if the kernel is ensured to run on bug-free
> >>> hardwares.
> >>
> >> Hmm, is it possible to make a PCI quirk so SUC/WUC will be decided
> >> automatically from the vendor:device ID of the PCI root controller?
> >> Then we don't need to rely on the user or distro maintainer to select an
> >> option. I see there is already many architecture-dependant #if
> >> directives in drivers/pci/quirks.c so I guess such a quirk is acceptable
> >> in PCI tree...
> > Not a good idea, pci quirks need too long a time to review, and we
> > don't know when this issue can be fixed in hardware.
> >
> >>
> >> If a PCI quirk is not possible, then is it possible to make a kernel
> >> command line option, leaving this CONFIG as the default value of the
> >> option? I guess in the future many LoongArch users will just install a
> >> binary distro, then it would be much easier to edit grub.cfg than
> >> rebuilding the kernel when they finally buy a compliant PCIe controller.
> > If we use command line parameter, we can remove this Kconfig option.
>
> An option is still useful as specifying the compile-time default for
> such a kernel parameter, IMO.
I will update commit messages and add a kernel parameter, thanks.

Huacai
>
> --
> WANG "xen0n" Xuerui
>
> Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
>
>