This series is based on the alternatives changes done in my svpbmt series
and thus also depends on Atish's isa-extension parsing series.
It implements using the cache-management instructions from the Zicbom-
extension to handle cache flush, etc actions on platforms needing them.
SoCs using cpu cores from T-Head like the Allwinne D1 implement a
different set of cache instructions. But while they are different,
instructions they provide the same functionality, so a variant can
easly hook into the existing alternatives mechanism on those.
An ongoing discussion is about the currently used pre-coded
instructions. Palmer's current thinking is that we should wait
until the relevant instructions have landed in binutils.
The main Zicbom instructions are in toolchains now and at least
Debian also carries a binutils snapshot with it, but the T-Head
variant still uses pre-coded instructions for now.
The series sits on top of my svpbmt fixup series, which
for example includes the conversion away from function pointers
for the check-functions.
It also uses my nops-series to shorten multiple nop statements:
https://lore.kernel.org/r/[email protected]
A new dma-noncoherent property was added for the devicetree-specification
and dt-schema in:
- https://www.spinics.net/lists/devicetree-spec/msg01053.html
- https://github.com/devicetree-org/dt-schema/pull/78
The dtschema-patch was already merged and patch1 in this series
got a reviewed-by from Rob, so I guess that new property should be
ok to use.
changes in v7:
- add recently received review-tags
- fix wrong rv32 mabi when testing for Zicbom in Kconfig
changes in v6:
- add recently received review-tags
- adapt non-coherent patch subject as suggested by Christoph Hellwig
changes in v5:
- beautify of_dma_is_coherent as suggested by Christoph Hellwig
- WARN_TAINT when ARCH_DMA_MINALIGN smaller than riscv,cbom-block-size
(similar to how arm64 does this)
- add a function to track if non-coherent handling is available
- WARN_TAINT if a device is non-coherent but no non-coherent handling
- use clean instead of inval in arch_sync_dma_for_device:DMA_FROM_DEVICE
hopefully I understood
https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
correctly in this
changes in v4:
- modify of_dma_is_coherent() also handle coherent system
with maybe noncoherent devices
- move Zicbom to use real instructions
- split off the actual dma-noncoherent code from the Zicbom
extension
- Don't assumes devices are non-coherent, instead default to
coherent and require the non-coherent ones to be marked
- CPUFEATURE_ZICBOM instead of CPUFEATURE_CMO
- fix used cache addresses
- drop some unused headers from dma-noncoherent.c
- move unsigned long cast when calling ALT_CMO_OP
- remove unneeded memset-0
- define ARCH_DMA_MINALIGN
- use flush instead of inval in arch_sync_dma_for_cpu()
- depend on !XIP_KERNEL
- trim some line lengths
- improve Kconfig description
changes in v3:
- rebase onto 5.19-rc1 + svpbmt-fixup-series
- adapt wording for block-size binding
- include asm/cacheflush.h into dma-noncoherent to fix the
no-prototype error clang seems to generate
- use __nops macro for readability
- add some received tags
- add a0 to the clobber list
changes in v2:
- cbom-block-size is hardware-specific and comes from firmware
- update Kconfig name to use the ISA extension name
- select the ALTERNATIVES symbol when enabled
- shorten the line lengths of the errata-assembly
Heiko Stuebner (4):
of: also handle dma-noncoherent in of_dma_is_coherent()
dt-bindings: riscv: document cbom-block-size
riscv: Add support for non-coherent devices using zicbom extension
riscv: implement cache-management errata for T-Head SoCs
.../devicetree/bindings/riscv/cpus.yaml | 5 +
arch/riscv/Kconfig | 31 +++++
arch/riscv/Kconfig.erratas | 11 ++
arch/riscv/Makefile | 4 +
arch/riscv/errata/thead/errata.c | 20 ++++
arch/riscv/include/asm/cache.h | 4 +
arch/riscv/include/asm/cacheflush.h | 10 ++
arch/riscv/include/asm/errata_list.h | 59 ++++++++-
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 24 ++++
arch/riscv/kernel/setup.c | 2 +
arch/riscv/mm/Makefile | 1 +
arch/riscv/mm/dma-noncoherent.c | 112 ++++++++++++++++++
drivers/of/address.c | 17 +--
15 files changed, 293 insertions(+), 9 deletions(-)
create mode 100644 arch/riscv/mm/dma-noncoherent.c
--
2.35.1
The Zicbom ISA-extension was ratified in november 2021
and introduces instructions for dcache invalidate, clean
and flush operations.
Implement cache management operations for non-coherent devices
based on them.
Of course not all cores will support this, so implement an
alternative-based mechanism that replaces empty instructions
with ones done around Zicbom instructions.
As discussed in previous versions, assume the platform
being coherent by default so that non-coherent devices need
to get marked accordingly by firmware.
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Anup Patel <[email protected]>
---
arch/riscv/Kconfig | 31 ++++++++
arch/riscv/Makefile | 4 +
arch/riscv/include/asm/cache.h | 4 +
arch/riscv/include/asm/cacheflush.h | 10 +++
arch/riscv/include/asm/errata_list.h | 19 ++++-
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpu.c | 1 +
arch/riscv/kernel/cpufeature.c | 24 ++++++
arch/riscv/kernel/setup.c | 2 +
arch/riscv/mm/Makefile | 1 +
arch/riscv/mm/dma-noncoherent.c | 112 +++++++++++++++++++++++++++
11 files changed, 208 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/mm/dma-noncoherent.c
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 32ffef9f6e5b..897ae28abf81 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -113,6 +113,7 @@ config RISCV
select MODULES_USE_ELF_RELA if MODULES
select MODULE_SECTIONS if MODULES
select OF
+ select OF_DMA_DEFAULT_COHERENT
select OF_EARLY_FLATTREE
select OF_IRQ
select PCI_DOMAINS_GENERIC if PCI
@@ -218,6 +219,14 @@ config PGTABLE_LEVELS
config LOCKDEP_SUPPORT
def_bool y
+config RISCV_DMA_NONCOHERENT
+ bool
+ select ARCH_HAS_DMA_PREP_COHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+ select ARCH_HAS_SYNC_DMA_FOR_CPU
+ select ARCH_HAS_SETUP_DMA_OPS
+ select DMA_DIRECT_REMAP
+
source "arch/riscv/Kconfig.socs"
source "arch/riscv/Kconfig.erratas"
@@ -376,6 +385,28 @@ config RISCV_ISA_SVPBMT
If you don't know what to do here, say Y.
+config CC_HAS_ZICBOM
+ bool
+ default y if 64BIT && $(cc-option,-mabi=lp64 -march=rv64ima_zicbom)
+ default y if 32BIT && $(cc-option,-mabi=ilp32 -march=rv32ima_zicbom)
+
+config RISCV_ISA_ZICBOM
+ bool "Zicbom extension support for non-coherent DMA operation"
+ depends on CC_HAS_ZICBOM
+ depends on !XIP_KERNEL
+ select RISCV_DMA_NONCOHERENT
+ select RISCV_ALTERNATIVE
+ default y
+ help
+ Adds support to dynamically detect the presence of the ZICBOM
+ extension (Cache Block Management Operations) and enable its
+ usage.
+
+ The Zicbom extension can be used to handle for example
+ non-coherent DMA support on devices that need it.
+
+ If you don't know what to do here, say Y.
+
config FPU
bool "FPU support"
default y
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 34cf8a598617..fbaabc98b3d2 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -56,6 +56,10 @@ riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c
toolchain-need-zicsr-zifencei := $(call cc-option-yn, -march=$(riscv-march-y)_zicsr_zifencei)
riscv-march-$(toolchain-need-zicsr-zifencei) := $(riscv-march-y)_zicsr_zifencei
+# Check if the toolchain supports Zicbom extension
+toolchain-supports-zicbom := $(call cc-option-yn, -march=$(riscv-march-y)_zicbom)
+riscv-march-$(toolchain-supports-zicbom) := $(riscv-march-y)_zicbom
+
KBUILD_CFLAGS += -march=$(subst fd,,$(riscv-march-y))
KBUILD_AFLAGS += -march=$(riscv-march-y)
diff --git a/arch/riscv/include/asm/cache.h b/arch/riscv/include/asm/cache.h
index 9b58b104559e..d3036df23ccb 100644
--- a/arch/riscv/include/asm/cache.h
+++ b/arch/riscv/include/asm/cache.h
@@ -11,6 +11,10 @@
#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
+#ifdef CONFIG_RISCV_DMA_NONCOHERENT
+#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
+#endif
+
/*
* RISC-V requires the stack pointer to be 16-byte aligned, so ensure that
* the flat loader aligns it accordingly.
diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 23ff70350992..a60acaecfeda 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -42,6 +42,16 @@ void flush_icache_mm(struct mm_struct *mm, bool local);
#endif /* CONFIG_SMP */
+#ifdef CONFIG_RISCV_ISA_ZICBOM
+void riscv_init_cbom_blocksize(void);
+#else
+static inline void riscv_init_cbom_blocksize(void) { }
+#endif
+
+#ifdef CONFIG_RISCV_DMA_NONCOHERENT
+void riscv_noncoherent_supported(void);
+#endif
+
/*
* Bits in sys_riscv_flush_icache()'s flags argument.
*/
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 398e351e7002..79d89aeeaa6c 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -20,7 +20,8 @@
#endif
#define CPUFEATURE_SVPBMT 0
-#define CPUFEATURE_NUMBER 1
+#define CPUFEATURE_ZICBOM 1
+#define CPUFEATURE_NUMBER 2
#ifdef __ASSEMBLY__
@@ -87,6 +88,22 @@ asm volatile(ALTERNATIVE( \
#define ALT_THEAD_PMA(_val)
#endif
+#define ALT_CMO_OP(_op, _start, _size, _cachesize) \
+asm volatile(ALTERNATIVE( \
+ __nops(5), \
+ "mv a0, %1\n\t" \
+ "j 2f\n\t" \
+ "3:\n\t" \
+ "cbo." __stringify(_op) " (a0)\n\t" \
+ "add a0, a0, %0\n\t" \
+ "2:\n\t" \
+ "bltu a0, %2, 3b\n\t", 0, \
+ CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM) \
+ : : "r"(_cachesize), \
+ "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \
+ "r"((unsigned long)(_start) + (_size)) \
+ : "a0")
+
#endif /* __ASSEMBLY__ */
#endif
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 4e2486881840..6044e402003d 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -53,6 +53,7 @@ extern unsigned long elf_hwcap;
enum riscv_isa_ext_id {
RISCV_ISA_EXT_SSCOFPMF = RISCV_ISA_EXT_BASE,
RISCV_ISA_EXT_SVPBMT,
+ RISCV_ISA_EXT_ZICBOM,
RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
};
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index fba9e9f46a8c..0365557f7122 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -89,6 +89,7 @@ int riscv_of_parent_hartid(struct device_node *node)
static struct riscv_isa_ext_data isa_ext_arr[] = {
__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
+ __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
};
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 6a40cb8134bd..d01a792a7201 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -12,6 +12,7 @@
#include <linux/module.h>
#include <linux/of.h>
#include <asm/alternative.h>
+#include <asm/cacheflush.h>
#include <asm/errata_list.h>
#include <asm/hwcap.h>
#include <asm/patch.h>
@@ -199,6 +200,7 @@ void __init riscv_fill_hwcap(void)
} else {
SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
+ SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
}
#undef SET_ISA_EXT_MAP
}
@@ -259,6 +261,25 @@ static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
return false;
}
+static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
+{
+#ifdef CONFIG_RISCV_ISA_ZICBOM
+ switch (stage) {
+ case RISCV_ALTERNATIVES_EARLY_BOOT:
+ return false;
+ default:
+ if (riscv_isa_extension_available(NULL, ZICBOM)) {
+ riscv_noncoherent_supported();
+ return true;
+ } else {
+ return false;
+ }
+ }
+#endif
+
+ return false;
+}
+
/*
* Probe presence of individual extensions.
*
@@ -273,6 +294,9 @@ static u32 __init_or_module cpufeature_probe(unsigned int stage)
if (cpufeature_probe_svpbmt(stage))
cpu_req_feature |= (1U << CPUFEATURE_SVPBMT);
+ if (cpufeature_probe_zicbom(stage))
+ cpu_req_feature |= (1U << CPUFEATURE_ZICBOM);
+
return cpu_req_feature;
}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f0f36a4a0e9b..95ef6e2bf45c 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -22,6 +22,7 @@
#include <linux/crash_dump.h>
#include <asm/alternative.h>
+#include <asm/cacheflush.h>
#include <asm/cpu_ops.h>
#include <asm/early_ioremap.h>
#include <asm/pgtable.h>
@@ -296,6 +297,7 @@ void __init setup_arch(char **cmdline_p)
#endif
riscv_fill_hwcap();
+ riscv_init_cbom_blocksize();
apply_boot_alternatives();
}
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index ac7a25298a04..d76aabf4b94d 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -30,3 +30,4 @@ endif
endif
obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
+obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
new file mode 100644
index 000000000000..a8dc0bd9078d
--- /dev/null
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V specific functions to support DMA for non-coherent devices
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include <linux/dma-direct.h>
+#include <linux/dma-map-ops.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <asm/cacheflush.h>
+
+static unsigned int riscv_cbom_block_size = L1_CACHE_BYTES;
+static bool noncoherent_supported;
+
+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir)
+{
+ void *vaddr = phys_to_virt(paddr);
+
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
+ break;
+ case DMA_FROM_DEVICE:
+ ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir)
+{
+ void *vaddr = phys_to_virt(paddr);
+
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+ void *flush_addr = page_address(page);
+
+ ALT_CMO_OP(flush, flush_addr, size, riscv_cbom_block_size);
+}
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+ const struct iommu_ops *iommu, bool coherent)
+{
+ WARN_TAINT(!coherent && riscv_cbom_block_size > ARCH_DMA_MINALIGN,
+ TAINT_CPU_OUT_OF_SPEC,
+ "%s %s: ARCH_DMA_MINALIGN smaller than riscv,cbom-block-size (%d < %d)",
+ dev_driver_string(dev), dev_name(dev),
+ ARCH_DMA_MINALIGN, riscv_cbom_block_size);
+
+ WARN_TAINT(!coherent && !noncoherent_supported, TAINT_CPU_OUT_OF_SPEC,
+ "%s %s: device non-coherent but no non-coherent operations supported",
+ dev_driver_string(dev), dev_name(dev));
+
+ dev->dma_coherent = coherent;
+}
+
+#ifdef CONFIG_RISCV_ISA_ZICBOM
+void riscv_init_cbom_blocksize(void)
+{
+ struct device_node *node;
+ int ret;
+ u32 val;
+
+ for_each_of_cpu_node(node) {
+ int hartid = riscv_of_processor_hartid(node);
+ int cbom_hartid;
+
+ if (hartid < 0)
+ continue;
+
+ /* set block-size for cbom extension if available */
+ ret = of_property_read_u32(node, "riscv,cbom-block-size", &val);
+ if (ret)
+ continue;
+
+ if (!riscv_cbom_block_size) {
+ riscv_cbom_block_size = val;
+ cbom_hartid = hartid;
+ } else {
+ if (riscv_cbom_block_size != val)
+ pr_warn("cbom-block-size mismatched between harts %d and %d\n",
+ cbom_hartid, hartid);
+ }
+ }
+}
+#endif
+
+void riscv_noncoherent_supported(void)
+{
+ noncoherent_supported = true;
+}
--
2.35.1
The T-Head C906 and C910 implement a scheme for handling
cache operations different from the generic Zicbom extension.
Add an errata for it next to the generic dma coherency ops.
Reviewed-by: Samuel Holland <[email protected]>
Tested-by: Samuel Holland <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Signed-off-by: Heiko Stuebner <[email protected]>
---
arch/riscv/Kconfig.erratas | 11 +++++++
arch/riscv/errata/thead/errata.c | 20 ++++++++++++
arch/riscv/include/asm/errata_list.h | 48 +++++++++++++++++++++++++---
3 files changed, 74 insertions(+), 5 deletions(-)
diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
index 457ac72c9b36..3223e533fd87 100644
--- a/arch/riscv/Kconfig.erratas
+++ b/arch/riscv/Kconfig.erratas
@@ -55,4 +55,15 @@ config ERRATA_THEAD_PBMT
If you don't know what to do here, say "Y".
+config ERRATA_THEAD_CMO
+ bool "Apply T-Head cache management errata"
+ depends on ERRATA_THEAD
+ select RISCV_DMA_NONCOHERENT
+ default y
+ help
+ This will apply the cache management errata to handle the
+ non-standard handling on non-coherent operations on T-Head SoCs.
+
+ If you don't know what to do here, say "Y".
+
endmenu
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index b37b6fedd53b..202c83f677b2 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -27,6 +27,23 @@ static bool errata_probe_pbmt(unsigned int stage,
return false;
}
+static bool errata_probe_cmo(unsigned int stage,
+ unsigned long arch_id, unsigned long impid)
+{
+#ifdef CONFIG_ERRATA_THEAD_CMO
+ if (arch_id != 0 || impid != 0)
+ return false;
+
+ if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
+ return false;
+
+ riscv_noncoherent_supported();
+ return true;
+#else
+ return false;
+#endif
+}
+
static u32 thead_errata_probe(unsigned int stage,
unsigned long archid, unsigned long impid)
{
@@ -35,6 +52,9 @@ static u32 thead_errata_probe(unsigned int stage,
if (errata_probe_pbmt(stage, archid, impid))
cpu_req_errata |= (1U << ERRATA_THEAD_PBMT);
+ if (errata_probe_cmo(stage, archid, impid))
+ cpu_req_errata |= (1U << ERRATA_THEAD_CMO);
+
return cpu_req_errata;
}
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 79d89aeeaa6c..19a771085781 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -16,7 +16,8 @@
#ifdef CONFIG_ERRATA_THEAD
#define ERRATA_THEAD_PBMT 0
-#define ERRATA_THEAD_NUMBER 1
+#define ERRATA_THEAD_CMO 1
+#define ERRATA_THEAD_NUMBER 2
#endif
#define CPUFEATURE_SVPBMT 0
@@ -88,17 +89,54 @@ asm volatile(ALTERNATIVE( \
#define ALT_THEAD_PMA(_val)
#endif
+/*
+ * dcache.ipa rs1 (invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01010 rs1 000 00000 0001011
+ * dache.iva rs1 (invalida, virtual address)
+ * 0000001 00110 rs1 000 00000 0001011
+ *
+ * dcache.cpa rs1 (clean, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01001 rs1 000 00000 0001011
+ * dcache.cva rs1 (clean, virtual address)
+ * 0000001 00100 rs1 000 00000 0001011
+ *
+ * dcache.cipa rs1 (clean then invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000001 01011 rs1 000 00000 0001011
+ * dcache.civa rs1 (... virtual address)
+ * 0000001 00111 rs1 000 00000 0001011
+ *
+ * sync.s (make sure all cache operations finished)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ * 0000000 11001 00000 000 00000 0001011
+ */
+#define THEAD_inval_A0 ".long 0x0265000b"
+#define THEAD_clean_A0 ".long 0x0245000b"
+#define THEAD_flush_A0 ".long 0x0275000b"
+#define THEAD_SYNC_S ".long 0x0190000b"
+
#define ALT_CMO_OP(_op, _start, _size, _cachesize) \
-asm volatile(ALTERNATIVE( \
- __nops(5), \
+asm volatile(ALTERNATIVE_2( \
+ __nops(6), \
"mv a0, %1\n\t" \
"j 2f\n\t" \
"3:\n\t" \
"cbo." __stringify(_op) " (a0)\n\t" \
"add a0, a0, %0\n\t" \
"2:\n\t" \
- "bltu a0, %2, 3b\n\t", 0, \
- CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM) \
+ "bltu a0, %2, 3b\n\t" \
+ "nop", 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \
+ "mv a0, %1\n\t" \
+ "j 2f\n\t" \
+ "3:\n\t" \
+ THEAD_##_op##_A0 "\n\t" \
+ "add a0, a0, %0\n\t" \
+ "2:\n\t" \
+ "bltu a0, %2, 3b\n\t" \
+ THEAD_SYNC_S, THEAD_VENDOR_ID, \
+ ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \
: : "r"(_cachesize), \
"r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \
"r"((unsigned long)(_start) + (_size)) \
--
2.35.1
Am Donnerstag, 7. Juli 2022, 01:15:35 CEST schrieb Heiko Stuebner:
> The Zicbom ISA-extension was ratified in november 2021
> and introduces instructions for dcache invalidate, clean
> and flush operations.
>
> Implement cache management operations for non-coherent devices
> based on them.
>
> Of course not all cores will support this, so implement an
> alternative-based mechanism that replaces empty instructions
> with ones done around Zicbom instructions.
>
> As discussed in previous versions, assume the platform
> being coherent by default so that non-coherent devices need
> to get marked accordingly by firmware.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Heiko Stuebner <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Atish Patra <[email protected]>
> Cc: Guo Ren <[email protected]>
> Cc: Anup Patel <[email protected]>
Over in v6 Guo provided [0] a
Reviewed-by: Guo Ren <[email protected]>
[0] https://lore.kernel.org/r/CAJF2gTRN1J3edjbt5L9ELLtMzXKWUABQb=QxDA90uY7mj=O0rw@mail.gmail.com
On Wed, 06 Jul 2022 16:15:36 PDT (-0700), [email protected] wrote:
> The T-Head C906 and C910 implement a scheme for handling
> cache operations different from the generic Zicbom extension.
>
> Add an errata for it next to the generic dma coherency ops.
>
> Reviewed-by: Samuel Holland <[email protected]>
> Tested-by: Samuel Holland <[email protected]>
> Reviewed-by: Guo Ren <[email protected]>
> Signed-off-by: Heiko Stuebner <[email protected]>
> ---
> arch/riscv/Kconfig.erratas | 11 +++++++
> arch/riscv/errata/thead/errata.c | 20 ++++++++++++
> arch/riscv/include/asm/errata_list.h | 48 +++++++++++++++++++++++++---
> 3 files changed, 74 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> index 457ac72c9b36..3223e533fd87 100644
> --- a/arch/riscv/Kconfig.erratas
> +++ b/arch/riscv/Kconfig.erratas
> @@ -55,4 +55,15 @@ config ERRATA_THEAD_PBMT
>
> If you don't know what to do here, say "Y".
>
> +config ERRATA_THEAD_CMO
> + bool "Apply T-Head cache management errata"
> + depends on ERRATA_THEAD
> + select RISCV_DMA_NONCOHERENT
> + default y
> + help
> + This will apply the cache management errata to handle the
> + non-standard handling on non-coherent operations on T-Head SoCs.
> +
> + If you don't know what to do here, say "Y".
> +
> endmenu
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index b37b6fedd53b..202c83f677b2 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -27,6 +27,23 @@ static bool errata_probe_pbmt(unsigned int stage,
> return false;
> }
>
> +static bool errata_probe_cmo(unsigned int stage,
> + unsigned long arch_id, unsigned long impid)
> +{
> +#ifdef CONFIG_ERRATA_THEAD_CMO
> + if (arch_id != 0 || impid != 0)
> + return false;
> +
> + if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> + return false;
> +
> + riscv_noncoherent_supported();
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> static u32 thead_errata_probe(unsigned int stage,
> unsigned long archid, unsigned long impid)
> {
> @@ -35,6 +52,9 @@ static u32 thead_errata_probe(unsigned int stage,
> if (errata_probe_pbmt(stage, archid, impid))
> cpu_req_errata |= (1U << ERRATA_THEAD_PBMT);
>
> + if (errata_probe_cmo(stage, archid, impid))
> + cpu_req_errata |= (1U << ERRATA_THEAD_CMO);
> +
> return cpu_req_errata;
> }
>
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 79d89aeeaa6c..19a771085781 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -16,7 +16,8 @@
>
> #ifdef CONFIG_ERRATA_THEAD
> #define ERRATA_THEAD_PBMT 0
> -#define ERRATA_THEAD_NUMBER 1
> +#define ERRATA_THEAD_CMO 1
> +#define ERRATA_THEAD_NUMBER 2
> #endif
>
> #define CPUFEATURE_SVPBMT 0
> @@ -88,17 +89,54 @@ asm volatile(ALTERNATIVE( \
> #define ALT_THEAD_PMA(_val)
> #endif
>
> +/*
> + * dcache.ipa rs1 (invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + * 0000001 01010 rs1 000 00000 0001011
> + * dache.iva rs1 (invalida, virtual address)
> + * 0000001 00110 rs1 000 00000 0001011
> + *
> + * dcache.cpa rs1 (clean, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + * 0000001 01001 rs1 000 00000 0001011
> + * dcache.cva rs1 (clean, virtual address)
> + * 0000001 00100 rs1 000 00000 0001011
> + *
> + * dcache.cipa rs1 (clean then invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + * 0000001 01011 rs1 000 00000 0001011
> + * dcache.civa rs1 (... virtual address)
> + * 0000001 00111 rs1 000 00000 0001011
> + *
> + * sync.s (make sure all cache operations finished)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + * 0000000 11001 00000 000 00000 0001011
> + */
> +#define THEAD_inval_A0 ".long 0x0265000b"
> +#define THEAD_clean_A0 ".long 0x0245000b"
> +#define THEAD_flush_A0 ".long 0x0275000b"
> +#define THEAD_SYNC_S ".long 0x0190000b"
I'm not sure what to do with these: I really don't want to have a bunch
of raw binary instruction encodings floating around, but it looks like
the T-Head folks want to re-write their ISA manual before merging the
GAS support for it which means we'd be stuck going another release cycle
(and presumably another year of LTS) before getting the hardware
supported. It really seems like we're just going in circles here trying
to get everything lined up, and it's getting silly blocking real
hardware from working because of a little bit of ugliness.
I know I said I really don't want the executable .long stuff for this,
and IIRC that's a pretty common sentiment. I'm not sure if I'm just fed
up with all the craziness, but I'm kind of inclined to just merge this
as-is -- at least that way we can get the hardware working. In the long
run I think we're going to end up with some much uglier errata, so I
doubt we'll be all that worried about this one later.
That said, I'll give folks some time to chime in as IIRC this has been
pointed out a handful of times.
> +
> #define ALT_CMO_OP(_op, _start, _size, _cachesize) \
> -asm volatile(ALTERNATIVE( \
> - __nops(5), \
> +asm volatile(ALTERNATIVE_2( \
> + __nops(6), \
> "mv a0, %1\n\t" \
> "j 2f\n\t" \
> "3:\n\t" \
> "cbo." __stringify(_op) " (a0)\n\t" \
> "add a0, a0, %0\n\t" \
> "2:\n\t" \
> - "bltu a0, %2, 3b\n\t", 0, \
> - CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM) \
> + "bltu a0, %2, 3b\n\t" \
> + "nop", 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \
> + "mv a0, %1\n\t" \
> + "j 2f\n\t" \
> + "3:\n\t" \
> + THEAD_##_op##_A0 "\n\t" \
> + "add a0, a0, %0\n\t" \
> + "2:\n\t" \
> + "bltu a0, %2, 3b\n\t" \
> + THEAD_SYNC_S, THEAD_VENDOR_ID, \
> + ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \
> : : "r"(_cachesize), \
> "r"((unsigned long)(_start) & ~((_cachesize) - 1UL)), \
> "r"((unsigned long)(_start) + (_size)) \
On Thu, Aug 4, 2022 at 2:28 AM Palmer Dabbelt <[email protected]> wrote:
> I know I said I really don't want the executable .long stuff for this,
> and IIRC that's a pretty common sentiment. I'm not sure if I'm just fed
> up with all the craziness, but I'm kind of inclined to just merge this
> as-is -- at least that way we can get the hardware working.
There is usually not much choice here if you want to allow building
with older toolchains. You might want to add a comment for each one
of those to reference the (projected) binutils version that adds support
so it can be cleaned up after you raise the minimum toolchain
requirements, but that takes years.
Arnd
On Wed, 06 Jul 2022 16:15:32 PDT (-0700), [email protected] wrote:
> This series is based on the alternatives changes done in my svpbmt series
> and thus also depends on Atish's isa-extension parsing series.
>
> It implements using the cache-management instructions from the Zicbom-
> extension to handle cache flush, etc actions on platforms needing them.
>
> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> different set of cache instructions. But while they are different,
> instructions they provide the same functionality, so a variant can
> easly hook into the existing alternatives mechanism on those.
>
>
> An ongoing discussion is about the currently used pre-coded
> instructions. Palmer's current thinking is that we should wait
> until the relevant instructions have landed in binutils.
>
> The main Zicbom instructions are in toolchains now and at least
> Debian also carries a binutils snapshot with it, but the T-Head
> variant still uses pre-coded instructions for now.
>
> The series sits on top of my svpbmt fixup series, which
> for example includes the conversion away from function pointers
> for the check-functions.
>
>
> It also uses my nops-series to shorten multiple nop statements:
> https://lore.kernel.org/r/[email protected]
>
>
> A new dma-noncoherent property was added for the devicetree-specification
> and dt-schema in:
> - https://www.spinics.net/lists/devicetree-spec/msg01053.html
> - https://github.com/devicetree-org/dt-schema/pull/78
>
> The dtschema-patch was already merged and patch1 in this series
> got a reviewed-by from Rob, so I guess that new property should be
> ok to use.
>
> changes in v7:
> - add recently received review-tags
> - fix wrong rv32 mabi when testing for Zicbom in Kconfig
>
> changes in v6:
> - add recently received review-tags
> - adapt non-coherent patch subject as suggested by Christoph Hellwig
>
> changes in v5:
> - beautify of_dma_is_coherent as suggested by Christoph Hellwig
> - WARN_TAINT when ARCH_DMA_MINALIGN smaller than riscv,cbom-block-size
> (similar to how arm64 does this)
> - add a function to track if non-coherent handling is available
> - WARN_TAINT if a device is non-coherent but no non-coherent handling
> - use clean instead of inval in arch_sync_dma_for_device:DMA_FROM_DEVICE
> hopefully I understood
> https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
> correctly in this
>
> changes in v4:
> - modify of_dma_is_coherent() also handle coherent system
> with maybe noncoherent devices
> - move Zicbom to use real instructions
> - split off the actual dma-noncoherent code from the Zicbom
> extension
> - Don't assumes devices are non-coherent, instead default to
> coherent and require the non-coherent ones to be marked
> - CPUFEATURE_ZICBOM instead of CPUFEATURE_CMO
> - fix used cache addresses
> - drop some unused headers from dma-noncoherent.c
> - move unsigned long cast when calling ALT_CMO_OP
> - remove unneeded memset-0
> - define ARCH_DMA_MINALIGN
> - use flush instead of inval in arch_sync_dma_for_cpu()
> - depend on !XIP_KERNEL
> - trim some line lengths
> - improve Kconfig description
>
> changes in v3:
> - rebase onto 5.19-rc1 + svpbmt-fixup-series
> - adapt wording for block-size binding
> - include asm/cacheflush.h into dma-noncoherent to fix the
> no-prototype error clang seems to generate
> - use __nops macro for readability
> - add some received tags
> - add a0 to the clobber list
>
> changes in v2:
> - cbom-block-size is hardware-specific and comes from firmware
> - update Kconfig name to use the ISA extension name
> - select the ALTERNATIVES symbol when enabled
> - shorten the line lengths of the errata-assembly
>
> Heiko Stuebner (4):
> of: also handle dma-noncoherent in of_dma_is_coherent()
> dt-bindings: riscv: document cbom-block-size
> riscv: Add support for non-coherent devices using zicbom extension
> riscv: implement cache-management errata for T-Head SoCs
>
> .../devicetree/bindings/riscv/cpus.yaml | 5 +
> arch/riscv/Kconfig | 31 +++++
> arch/riscv/Kconfig.erratas | 11 ++
> arch/riscv/Makefile | 4 +
> arch/riscv/errata/thead/errata.c | 20 ++++
> arch/riscv/include/asm/cache.h | 4 +
> arch/riscv/include/asm/cacheflush.h | 10 ++
> arch/riscv/include/asm/errata_list.h | 59 ++++++++-
> arch/riscv/include/asm/hwcap.h | 1 +
> arch/riscv/kernel/cpu.c | 1 +
> arch/riscv/kernel/cpufeature.c | 24 ++++
> arch/riscv/kernel/setup.c | 2 +
> arch/riscv/mm/Makefile | 1 +
> arch/riscv/mm/dma-noncoherent.c | 112 ++++++++++++++++++
> drivers/of/address.c | 17 +--
> 15 files changed, 293 insertions(+), 9 deletions(-)
> create mode 100644 arch/riscv/mm/dma-noncoherent.c
Thanks, this is on for-next. I had to fix up a few things, nothing big
but I did end up making Zicbom depend on MMU -- that's probably not
strictly necessary, but it looks like the dma_noncoherent stuff pulls in
some MMU dependencies. Since the only hardware that has Zicbom also has
an MMU I figured it's OK for now, but happy to take an improvement if
someone has one.
Since there's a new extension it also requires updating sparse, I sent a
patch (linked in the merge commit).