2021-05-19 19:15:11

by Guo Ren

[permalink] [raw]
Subject: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: Guo Ren <[email protected]>

The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let
vendors define the custom properties of memory regions in PTE.

This patchset helps SOC vendors to support their own custom interconnect
coherent solution with PTE attributes.

For example, allwinner D1[1] uses T-HEAD C906 as main processor, C906 has
two modes in MMU:
- Compatible mode, the same as the definitions in spec.
- Enhanced mode, add custom DMA_COHERENT attribute bits in PTE which
not mentioned in spec.

Allwinner D1 needs the enhanced mode to support the DMA type device with
non-coherent interconnect in its SOC. C906 uses BITS(63 - 59) as custom
attribute bits in PTE.

Here is the config example for Allwinner D1:
CONFIG_RISCV_DMA_COHERENT=y
CONFIG_RISCV_PAGE_DMA_MASK=0xf800000000000000
CONFIG_RISCV_PAGE_CACHE=0x7000000000000000
CONFIG_RISCV_PAGE_DMA_NONCACHE=0x8000000000000000

Link: https://linux-sunxi.org/D1 [1]

Guo Ren (3):
riscv: pgtable.h: Fixup _PAGE_CHG_MASK usage
riscv: Add DMA_COHERENT for custom PTE attributes
riscv: Add SYNC_DMA_FOR_CPU/DEVICE for DMA_COHERENT

arch/riscv/Kconfig | 31 ++++++++++++++++++++++++++
arch/riscv/include/asm/pgtable-64.h | 8 ++++---
arch/riscv/include/asm/pgtable-bits.h | 13 ++++++++++-
arch/riscv/include/asm/pgtable.h | 26 +++++++++++++++++-----
arch/riscv/include/asm/sbi.h | 16 ++++++++++++++
arch/riscv/kernel/sbi.c | 19 ++++++++++++++++
arch/riscv/mm/Makefile | 4 ++++
arch/riscv/mm/dma-mapping.c | 41 +++++++++++++++++++++++++++++++++++
8 files changed, 148 insertions(+), 10 deletions(-)
create mode 100644 arch/riscv/mm/dma-mapping.c

--
2.7.4



2021-05-19 19:15:15

by Guo Ren

[permalink] [raw]
Subject: [PATCH RFC 2/3] riscv: Add DMA_COHERENT for custom PTE attributes

From: Guo Ren <[email protected]>

The RISC-V ISA doesn't yet specify how to query or modify PMAs, so
let vendors define the custom properties of memory regions in PTE.

That means address attributes would use PTE entry not PMA to meet
the different requirements of IO/mem.

The patch helps SOC vendors to support their own custom
interconnect coherent solution with PTE attributes.

Signed-off-by: Guo Ren <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Drew Fustini <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Wei Fu <[email protected]>
Cc: Wei Wu <[email protected]>
---
arch/riscv/Kconfig | 27 +++++++++++++++++++++++++++
arch/riscv/include/asm/pgtable-bits.h | 13 ++++++++++++-
arch/riscv/include/asm/pgtable.h | 7 ++++---
3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a8ad8eb..632fac5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -376,6 +376,33 @@ config FPU

If you don't know what to do here, say Y.

+config RISCV_DMA_COHERENT
+ bool "Custom DMA coherent support"
+ depends on MMU
+ help
+ Help SOC vendors to support their own custom interconnect coherent
+ solution with PTE attributes.
+
+ The RISC-V ISA doesn't yet specify how to query or modify PMAs, so let
+ vendors define the custom properties of memory regions in PTE.
+
+ If you don't know what to do here, say N.
+
+config RISCV_PAGE_DMA_MASK
+ hex "Custom DMA attributes' mask bits in pte"
+ depends on RISCV_DMA_COHERENT
+ default "0x0"
+
+config RISCV_PAGE_CACHE
+ hex "Custom CACHE attribute bits in pte"
+ depends on RISCV_DMA_COHERENT
+ default "0x0"
+
+config RISCV_PAGE_DMA_NONCACHE
+ hex "Custom NONCACHE attribute bits in pte"
+ depends on RISCV_DMA_COHERENT
+ default "0x0"
+
endmenu

menu "Kernel features"
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index bbaeb5d..071c5dc 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -24,6 +24,16 @@
#define _PAGE_DIRTY (1 << 7) /* Set by hardware on any write */
#define _PAGE_SOFT (1 << 8) /* Reserved for software */

+#ifdef CONFIG_RISCV_DMA_COHERENT
+#define _PAGE_DMA_MASK CONFIG_RISCV_PAGE_DMA_MASK
+#define _PAGE_CACHE CONFIG_RISCV_PAGE_CACHE
+#define _PAGE_DMA_NONCACHE CONFIG_RISCV_PAGE_DMA_NONCACHE
+#else
+#define _PAGE_DMA_MASK (0UL)
+#define _PAGE_CACHE (0UL)
+#define _PAGE_DMA_NONCACHE (0UL)
+#endif
+
#define _PAGE_SPECIAL _PAGE_SOFT
#define _PAGE_TABLE _PAGE_PRESENT

@@ -38,6 +48,7 @@
/* Set of bits to preserve across pte_modify() */
#define _PAGE_CHG_MASK (~(unsigned long)(_PAGE_PRESENT | _PAGE_READ | \
_PAGE_WRITE | _PAGE_EXEC | \
- _PAGE_USER | _PAGE_GLOBAL))
+ _PAGE_USER | _PAGE_GLOBAL | \
+ _PAGE_DMA_MASK))

#endif /* _ASM_RISCV_PGTABLE_BITS_H */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 869d6bf..f822f22 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -114,7 +114,7 @@
#define USER_PTRS_PER_PGD (TASK_SIZE / PGDIR_SIZE)

/* Page protection bits */
-#define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER)
+#define _PAGE_BASE (_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER | _PAGE_CACHE)

#define PAGE_NONE __pgprot(_PAGE_PROT_NONE)
#define PAGE_READ __pgprot(_PAGE_BASE | _PAGE_READ)
@@ -134,7 +134,8 @@
| _PAGE_WRITE \
| _PAGE_PRESENT \
| _PAGE_ACCESSED \
- | _PAGE_DIRTY)
+ | _PAGE_DIRTY \
+ | _PAGE_CACHE)

#define PAGE_KERNEL __pgprot(_PAGE_KERNEL)
#define PAGE_KERNEL_READ __pgprot(_PAGE_KERNEL & ~_PAGE_WRITE)
@@ -148,7 +149,7 @@
* The RISC-V ISA doesn't yet specify how to query or modify PMAs, so we can't
* change the properties of memory regions.
*/
-#define _PAGE_IOREMAP _PAGE_KERNEL
+#define _PAGE_IOREMAP ((_PAGE_KERNEL & ~_PAGE_DMA_MASK) | _PAGE_DMA_NONCACHE)

extern pgd_t swapper_pg_dir[];

--
2.7.4


2021-05-19 19:15:19

by Guo Ren

[permalink] [raw]
Subject: [PATCH RFC 3/3] riscv: Add SYNC_DMA_FOR_CPU/DEVICE for DMA_COHERENT

From: Guo Ren <[email protected]>

To support DMA device in a non-coherent interconnect SOC system,
we need the below facilities:
- Change a memory region attributes from cacheable to strong.
It would be used in DMA descriptors.
- Sync the cache with memory before DMA start and after DMA end.
It would be used for DMA data transfer buffers.

This patch enables kernel dma/direct.c coherent infrastructure and
a new sbi_ecall API for dma_sync.

Signed-off-by: Guo Ren <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Drew Fustini <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Wei Fu <[email protected]>
Cc: Wei Wu <[email protected]>
---
arch/riscv/Kconfig | 4 ++++
arch/riscv/include/asm/pgtable.h | 13 +++++++++++++
arch/riscv/include/asm/sbi.h | 16 ++++++++++++++++
arch/riscv/kernel/sbi.c | 19 +++++++++++++++++++
arch/riscv/mm/Makefile | 4 ++++
arch/riscv/mm/dma-mapping.c | 41 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 97 insertions(+)
create mode 100644 arch/riscv/mm/dma-mapping.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 632fac5..94a736a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -20,6 +20,9 @@ config RISCV
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DEBUG_WX
+ select ARCH_HAS_DMA_PREP_COHERENT if RISCV_DMA_COHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_CPU if RISCV_DMA_COHERENT
+ select ARCH_HAS_SYNC_DMA_FOR_DEVICE if RISCV_DMA_COHERENT
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
@@ -41,6 +44,7 @@ config RISCV
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
+ select DMA_DIRECT_REMAP if RISCV_DMA_COHERENT
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index f822f22..8994d58 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -465,6 +465,19 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
return ptep_test_and_clear_young(vma, address, ptep);
}

+#ifdef CONFIG_RISCV_DMA_COHERENT
+#define pgprot_noncached pgprot_noncached
+static inline pgprot_t pgprot_noncached(pgprot_t _prot)
+{
+ unsigned long prot = pgprot_val(_prot);
+
+ prot &= ~_PAGE_DMA_MASK;
+ prot |= _PAGE_DMA_NONCACHE;
+
+ return __pgprot(prot);
+}
+#endif
+
/*
* Encode and decode a swap entry
*
diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 0d42693..08b4244 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -27,6 +27,7 @@ enum sbi_ext_id {
SBI_EXT_IPI = 0x735049,
SBI_EXT_RFENCE = 0x52464E43,
SBI_EXT_HSM = 0x48534D,
+ SBI_EXT_DMA = 0xAB150401,
};

enum sbi_ext_base_fid {
@@ -37,6 +38,7 @@ enum sbi_ext_base_fid {
SBI_EXT_BASE_GET_MVENDORID,
SBI_EXT_BASE_GET_MARCHID,
SBI_EXT_BASE_GET_MIMPID,
+ SBI_EXT_RFENCE_REMOTE_DMA_SYNC,
};

enum sbi_ext_time_fid {
@@ -63,6 +65,17 @@ enum sbi_ext_hsm_fid {
SBI_EXT_HSM_HART_STATUS,
};

+enum sbi_ext_dma_fid {
+ SBI_DMA_SYNC = 0,
+};
+
+enum sbi_dma_sync_data_direction {
+ SBI_DMA_BIDIRECTIONAL = 0,
+ SBI_DMA_TO_DEVICE = 1,
+ SBI_DMA_FROM_DEVICE = 2,
+ SBI_DMA_NONE = 3,
+};
+
enum sbi_hsm_hart_status {
SBI_HSM_HART_STATUS_STARTED = 0,
SBI_HSM_HART_STATUS_STOPPED,
@@ -128,6 +141,9 @@ int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
unsigned long size,
unsigned long asid);
int sbi_probe_extension(int ext);
+void sbi_dma_sync(unsigned long start,
+ unsigned long size,
+ enum sbi_dma_sync_data_direction dir);

/* Check if current SBI specification version is 0.1 or not */
static inline int sbi_spec_is_0_1(void)
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 7402a41..ff8e18b 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -521,6 +521,25 @@ int sbi_probe_extension(int extid)
}
EXPORT_SYMBOL(sbi_probe_extension);

+void sbi_dma_sync(unsigned long start,
+ unsigned long size,
+ enum sbi_dma_sync_data_direction dir)
+{
+#if 0
+ sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir,
+ 0, 0, 0);
+#else
+ /* Just for try, it should be in sbi ecall and will be removed before merged */
+ register unsigned long i asm("a0") = start & ~(L1_CACHE_BYTES - 1);
+
+ for (; i < (start + size); i += L1_CACHE_BYTES)
+ __asm__ __volatile__(".long 0x02b5000b");
+
+ __asm__ __volatile__(".long 0x01b0000b");
+#endif
+}
+EXPORT_SYMBOL(sbi_dma_sync);
+
static long __sbi_base_ecall(int fid)
{
struct sbiret ret;
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 7ebaef1..b67d956 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -14,6 +14,10 @@ obj-$(CONFIG_MMU) += fault.o pageattr.o
obj-y += cacheflush.o
obj-y += context.o

+ifeq ($(CONFIG_RISCV_DMA_COHERENT), y)
+obj-y += dma-mapping.o
+endif
+
ifeq ($(CONFIG_MMU),y)
obj-$(CONFIG_SMP) += tlbflush.o
endif
diff --git a/arch/riscv/mm/dma-mapping.c b/arch/riscv/mm/dma-mapping.c
new file mode 100644
index 00000000..f5db60b
--- /dev/null
+++ b/arch/riscv/mm/dma-mapping.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/dma-map-ops.h>
+#include <asm/sbi.h>
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+ void *ptr = page_address(page);
+
+ memset(ptr, 0, size);
+ sbi_dma_sync(page_to_phys(page), size, SBI_DMA_BIDIRECTIONAL);
+}
+
+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ sbi_dma_sync(paddr, size, dir);
+ break;
+ default:
+ BUG();
+ }
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ return;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ sbi_dma_sync(paddr, size, dir);
+ break;
+ default:
+ BUG();
+ }
+}
--
2.7.4


2021-05-19 19:16:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 01:48:23PM +0800, Guo Ren wrote:
> The patchset just leaves a configuration chance for vendors. Before
> RISC-V ISA fixes it, we should give the chance to let vendor solve
> their real chip issues.

No. The vendors need to work to get a feature standardized before
implementing it. There is other way to have a sane kernel build that
supports all the different SOCs.

2021-05-19 19:17:59

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] riscv: Add SYNC_DMA_FOR_CPU/DEVICE for DMA_COHERENT

On Wed, May 19, 2021 at 1:05 PM <[email protected]> wrote:
>
> From: Guo Ren <[email protected]>
>
> To support DMA device in a non-coherent interconnect SOC system,
> we need the below facilities:
> - Change a memory region attributes from cacheable to strong.
> It would be used in DMA descriptors.
> - Sync the cache with memory before DMA start and after DMA end.
> It would be used for DMA data transfer buffers.
>
> This patch enables kernel dma/direct.c coherent infrastructure and
> a new sbi_ecall API for dma_sync.
>
> Signed-off-by: Guo Ren <[email protected]>
> Cc: Anup Patel <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Drew Fustini <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Wei Fu <[email protected]>
> Cc: Wei Wu <[email protected]>
> ---
> arch/riscv/Kconfig | 4 ++++
> arch/riscv/include/asm/pgtable.h | 13 +++++++++++++
> arch/riscv/include/asm/sbi.h | 16 ++++++++++++++++
> arch/riscv/kernel/sbi.c | 19 +++++++++++++++++++
> arch/riscv/mm/Makefile | 4 ++++
> arch/riscv/mm/dma-mapping.c | 41 ++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 97 insertions(+)
> create mode 100644 arch/riscv/mm/dma-mapping.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 632fac5..94a736a 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -20,6 +20,9 @@ config RISCV
> select ARCH_HAS_DEBUG_VM_PGTABLE
> select ARCH_HAS_DEBUG_VIRTUAL if MMU
> select ARCH_HAS_DEBUG_WX
> + select ARCH_HAS_DMA_PREP_COHERENT if RISCV_DMA_COHERENT
> + select ARCH_HAS_SYNC_DMA_FOR_CPU if RISCV_DMA_COHERENT
> + select ARCH_HAS_SYNC_DMA_FOR_DEVICE if RISCV_DMA_COHERENT
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_GCOV_PROFILE_ALL
> select ARCH_HAS_GIGANTIC_PAGE
> @@ -41,6 +44,7 @@ config RISCV
> select CLONE_BACKWARDS
> select CLINT_TIMER if !MMU
> select COMMON_CLK
> + select DMA_DIRECT_REMAP if RISCV_DMA_COHERENT
> select EDAC_SUPPORT
> select GENERIC_ARCH_TOPOLOGY if SMP
> select GENERIC_ATOMIC64 if !64BIT
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index f822f22..8994d58 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -465,6 +465,19 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> return ptep_test_and_clear_young(vma, address, ptep);
> }
>
> +#ifdef CONFIG_RISCV_DMA_COHERENT
> +#define pgprot_noncached pgprot_noncached
> +static inline pgprot_t pgprot_noncached(pgprot_t _prot)
> +{
> + unsigned long prot = pgprot_val(_prot);
> +
> + prot &= ~_PAGE_DMA_MASK;
> + prot |= _PAGE_DMA_NONCACHE;
> +
> + return __pgprot(prot);
> +}
> +#endif
> +
> /*
> * Encode and decode a swap entry
> *
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 0d42693..08b4244 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -27,6 +27,7 @@ enum sbi_ext_id {
> SBI_EXT_IPI = 0x735049,
> SBI_EXT_RFENCE = 0x52464E43,
> SBI_EXT_HSM = 0x48534D,
> + SBI_EXT_DMA = 0xAB150401,
> };
>
> enum sbi_ext_base_fid {
> @@ -37,6 +38,7 @@ enum sbi_ext_base_fid {
> SBI_EXT_BASE_GET_MVENDORID,
> SBI_EXT_BASE_GET_MARCHID,
> SBI_EXT_BASE_GET_MIMPID,
> + SBI_EXT_RFENCE_REMOTE_DMA_SYNC,
Oops ... It's no use. Remove in next version patch.

> };
>
> enum sbi_ext_time_fid {
> @@ -63,6 +65,17 @@ enum sbi_ext_hsm_fid {
> SBI_EXT_HSM_HART_STATUS,
> };
>
> +enum sbi_ext_dma_fid {
> + SBI_DMA_SYNC = 0,
> +};
> +
> +enum sbi_dma_sync_data_direction {
> + SBI_DMA_BIDIRECTIONAL = 0,
> + SBI_DMA_TO_DEVICE = 1,
> + SBI_DMA_FROM_DEVICE = 2,
> + SBI_DMA_NONE = 3,
> +};
> +
> enum sbi_hsm_hart_status {
> SBI_HSM_HART_STATUS_STARTED = 0,
> SBI_HSM_HART_STATUS_STOPPED,
> @@ -128,6 +141,9 @@ int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
> unsigned long size,
> unsigned long asid);
> int sbi_probe_extension(int ext);
> +void sbi_dma_sync(unsigned long start,
> + unsigned long size,
> + enum sbi_dma_sync_data_direction dir);
>
> /* Check if current SBI specification version is 0.1 or not */
> static inline int sbi_spec_is_0_1(void)
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index 7402a41..ff8e18b 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -521,6 +521,25 @@ int sbi_probe_extension(int extid)
> }
> EXPORT_SYMBOL(sbi_probe_extension);
>
> +void sbi_dma_sync(unsigned long start,
> + unsigned long size,
> + enum sbi_dma_sync_data_direction dir)
> +{
> +#if 0
> + sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir,
> + 0, 0, 0);
> +#else
> + /* Just for try, it should be in sbi ecall and will be removed before merged */
> + register unsigned long i asm("a0") = start & ~(L1_CACHE_BYTES - 1);
> +
> + for (; i < (start + size); i += L1_CACHE_BYTES)
> + __asm__ __volatile__(".long 0x02b5000b");
> +
> + __asm__ __volatile__(".long 0x01b0000b");
> +#endif
> +}
> +EXPORT_SYMBOL(sbi_dma_sync);
> +
> static long __sbi_base_ecall(int fid)
> {
> struct sbiret ret;
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index 7ebaef1..b67d956 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -14,6 +14,10 @@ obj-$(CONFIG_MMU) += fault.o pageattr.o
> obj-y += cacheflush.o
> obj-y += context.o
>
> +ifeq ($(CONFIG_RISCV_DMA_COHERENT), y)
> +obj-y += dma-mapping.o
> +endif
> +
> ifeq ($(CONFIG_MMU),y)
> obj-$(CONFIG_SMP) += tlbflush.o
> endif
> diff --git a/arch/riscv/mm/dma-mapping.c b/arch/riscv/mm/dma-mapping.c
> new file mode 100644
> index 00000000..f5db60b
> --- /dev/null
> +++ b/arch/riscv/mm/dma-mapping.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/dma-map-ops.h>
> +#include <asm/sbi.h>
> +
> +void arch_dma_prep_coherent(struct page *page, size_t size)
> +{
> + void *ptr = page_address(page);
> +
> + memset(ptr, 0, size);
> + sbi_dma_sync(page_to_phys(page), size, SBI_DMA_BIDIRECTIONAL);
> +}
> +
> +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size,
> + enum dma_data_direction dir)
> +{
> + switch (dir) {
> + case DMA_TO_DEVICE:
> + case DMA_FROM_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + sbi_dma_sync(paddr, size, dir);
> + break;
> + default:
> + BUG();
> + }
> +}
> +
> +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size,
> + enum dma_data_direction dir)
> +{
> + switch (dir) {
> + case DMA_TO_DEVICE:
> + return;
> + case DMA_FROM_DEVICE:
> + case DMA_BIDIRECTIONAL:
> + sbi_dma_sync(paddr, size, dir);
> + break;
> + default:
> + BUG();
> + }
> +}
> --
> 2.7.4
>


--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-19 19:18:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> This patch series looks like it might be useful for the StarFive JH7100
> [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> USB and SDIO require that the L2 cache must be manually flushed after
> DMA operations if the data is intended to be shared with U74 cores [2].

Not too much, given that the SiFive lineage CPUs have an uncached
window, that is a totally different way to allocate uncached memory.

> There is the RISC-V Cache Management Operation, or CMO, task group [3]
> but I am not sure if that can help the SoC's that have already been
> fabbed like the the D1 and the JH7100.

It does, because unimplemented instructions trap into M-mode, where they
can be emulated.

Or to summarize things. Non-coherent DMA (and not coherent as title in
this series) requires two things:

1) allocating chunks of memory that is marked as not cachable
2) instructions to invalidate and/or writeback cache lines

none of which currently exists in RISV-V. Hacking vendor specific
cruft into the kernel doesn't scale, as shown perfectly by this
series which requires to hard code vendor specific non-standardized
extensions in a kernel that makes it specific to that implementation.

What we need to do is to standardize a way to do this properly, and then
after that figure out a way to quirk in non-compliant implementations
in a way that does not harm the general kernel.

2021-05-19 19:18:32

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 2:06 PM Christoph Hellwig <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > Since the existing RISC-V ISA cannot solve this problem, it is better
> > to provide some configuration for the SOC vendor to customize.
>
> We've been talking about this problem for close to five years. So no,
> if you don't manage to get the feature into the ISA it can't be
> supported.

arch/riscv/errata/ is also defined in riscv ISA?

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-19 19:19:11

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 1:55 PM Christoph Hellwig <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 01:48:23PM +0800, Guo Ren wrote:
> > The patchset just leaves a configuration chance for vendors. Before
> > RISC-V ISA fixes it, we should give the chance to let vendor solve
> > their real chip issues.
>
> No. The vendors need to work to get a feature standardized before
> implementing it. There is other way to have a sane kernel build that
> supports all the different SOCs.

I've said the patchset doesn't define any features, It just leaves the
chance for vendors.

It's not in conflict with any standardized riscv ISA.


--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-19 19:20:14

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
> > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > Since the existing RISC-V ISA cannot solve this problem, it is better
> > > to provide some configuration for the SOC vendor to customize.
> >
> > We've been talking about this problem for close to five years. So no,
> > if you don't manage to get the feature into the ISA it can't be
> > supported.
>
> Isn't it a good goal for Linux to support the capabilities present in
> the SoC that a currently being fab'd?
>
> I believe the CMO group only started last year [1] so the RV64GC SoCs
> that are going into mass production this year would not have had the
> opporuntiy of utilizing any RISC-V ISA extension for handling cache
> management.

The current Linux RISC-V policy is to only accept patches for frozen or
ratified ISA specs.
(Refer, Documentation/riscv/patch-acceptance.rst)

This means even if emulate CMO instructions in OpenSBI, the Linux
patches won't be taken by Palmer because CMO specification is
still in draft stage.

Also, we all know how much time it takes for RISCV international
to freeze some spec. Judging by that we are looking at another
3-4 years at minimum.

Regards,
Anup

2021-05-19 19:22:41

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On 2021/05/19 16:16, Anup Patel wrote:
> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
>>
>> On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
>>> On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
>>>> Since the existing RISC-V ISA cannot solve this problem, it is better
>>>> to provide some configuration for the SOC vendor to customize.
>>>
>>> We've been talking about this problem for close to five years. So no,
>>> if you don't manage to get the feature into the ISA it can't be
>>> supported.
>>
>> Isn't it a good goal for Linux to support the capabilities present in
>> the SoC that a currently being fab'd?
>>
>> I believe the CMO group only started last year [1] so the RV64GC SoCs
>> that are going into mass production this year would not have had the
>> opporuntiy of utilizing any RISC-V ISA extension for handling cache
>> management.
>
> The current Linux RISC-V policy is to only accept patches for frozen or
> ratified ISA specs.
> (Refer, Documentation/riscv/patch-acceptance.rst)
>
> This means even if emulate CMO instructions in OpenSBI, the Linux
> patches won't be taken by Palmer because CMO specification is
> still in draft stage.
>
> Also, we all know how much time it takes for RISCV international
> to freeze some spec. Judging by that we are looking at another
> 3-4 years at minimum.

Which is the root cause of most problems with riscv extension support in Linux.
All RISC-V foundation members need to apply pressure on the foundation and these
standard groups to deliver frozen specifications with an acceptable schedule.
c.f. the H extensions specs which are not yet frozen despite not having been
changed for months if not years.


--
Damien Le Moal
Western Digital Research

2021-05-20 01:47:40

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
>
> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> > This patch series looks like it might be useful for the StarFive JH7100
> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> > USB and SDIO require that the L2 cache must be manually flushed after
> > DMA operations if the data is intended to be shared with U74 cores [2].
>
> Not too much, given that the SiFive lineage CPUs have an uncached
> window, that is a totally different way to allocate uncached memory.
It's a very big MIPS smell. What's the attribute of the uncached
window? (uncached + strong-order/ uncached + weak, most vendors still
use AXI interconnect, how to deal with a bufferable attribute?) In
fact, customers' drivers use different ways to deal with DMA memory in
non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
the same way in DMA memory is a smart choice. So using PTE attributes
is more suitable.

See: https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
4.4.1
The draft supports custom attribute bits in PTE.

Although I do not agree with uncached windows, this patchset does not
conflict with SiFive uncached windows.

>
> > There is the RISC-V Cache Management Operation, or CMO, task group [3]
> > but I am not sure if that can help the SoC's that have already been
> > fabbed like the the D1 and the JH7100.
>
> It does, because unimplemented instructions trap into M-mode, where they
> can be emulated.
>
> Or to summarize things. Non-coherent DMA (and not coherent as title in
> this series) requires two things:
>
> 1) allocating chunks of memory that is marked as not cachable
> 2) instructions to invalidate and/or writeback cache lines
Maybe sbi_ecall (dma_sync) is enough and let the vendor deal with it
in opensbi. From a hardware view, CMO instruction only could deal with
one cache line, then CMO-trap is not a good idea.

>
> none of which currently exists in RISV-V. Hacking vendor specific
> cruft into the kernel doesn't scale, as shown perfectly by this
> series which requires to hard code vendor-specific non-standardized
> extensions in a kernel that makes it specific to that implementation.
>
> What we need to do is to standardize a way to do this properly, and then
> after that figure out a way to quirk in non-compliant implementations
> in a way that does not harm the general kernel.


--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-20 01:49:25

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
> >
> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > > Since the existing RISC-V ISA cannot solve this problem, it is better
> > > > to provide some configuration for the SOC vendor to customize.
> > >
> > > We've been talking about this problem for close to five years. So no,
> > > if you don't manage to get the feature into the ISA it can't be
> > > supported.
> >
> > Isn't it a good goal for Linux to support the capabilities present in
> > the SoC that a currently being fab'd?
> >
> > I believe the CMO group only started last year [1] so the RV64GC SoCs
> > that are going into mass production this year would not have had the
> > opporuntiy of utilizing any RISC-V ISA extension for handling cache
> > management.
>
> The current Linux RISC-V policy is to only accept patches for frozen or
> ratified ISA specs.
> (Refer, Documentation/riscv/patch-acceptance.rst)
>
> This means even if emulate CMO instructions in OpenSBI, the Linux
> patches won't be taken by Palmer because CMO specification is
> still in draft stage.
How do you think about
sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir, 0, 0, 0);
? thx
>
> Also, we all know how much time it takes for RISCV international
> to freeze some spec. Judging by that we are looking at another
> 3-4 years at minimum.
>
> Regards,
> Anup



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-20 02:00:38

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Thu, May 20, 2021 at 9:47 AM Guo Ren <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]> wrote:
> >
> > On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
> > >
> > > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
> > > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > > > Since the existing RISC-V ISA cannot solve this problem, it is better
> > > > > to provide some configuration for the SOC vendor to customize.
> > > >
> > > > We've been talking about this problem for close to five years. So no,
> > > > if you don't manage to get the feature into the ISA it can't be
> > > > supported.
> > >
> > > Isn't it a good goal for Linux to support the capabilities present in
> > > the SoC that a currently being fab'd?
> > >
> > > I believe the CMO group only started last year [1] so the RV64GC SoCs
> > > that are going into mass production this year would not have had the
> > > opporuntiy of utilizing any RISC-V ISA extension for handling cache
> > > management.
> >
> > The current Linux RISC-V policy is to only accept patches for frozen or
> > ratified ISA specs.
> > (Refer, Documentation/riscv/patch-acceptance.rst)
> >
> > This means even if emulate CMO instructions in OpenSBI, the Linux
I think it's CBO now.
https://github.com/riscv/riscv-CMOs/blob/master/discussion-files/RISC_V_range_CMOs_bad_v1.00.pdf

> > patches won't be taken by Palmer because CMO specification is
> > still in draft stage.
> How do you think about
> sbi_ecall(SBI_EXT_DMA, SBI_DMA_SYNC, start, size, dir, 0, 0, 0);
> ? thx
CBO insn trap is okay for me ;-)

> >
> > Also, we all know how much time it takes for RISCV international
> > to freeze some spec. Judging by that we are looking at another
> > 3-4 years at minimum.
> >
> > Regards,
> > Anup
>
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-20 05:49:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Thu, May 20, 2021 at 09:45:45AM +0800, Guo Ren wrote:
> It's a very big MIPS smell. What's the attribute of the uncached
> window? (uncached + strong-order/ uncached + weak, most vendors still
> use AXI interconnect, how to deal with a bufferable attribute?) In
> fact, customers' drivers use different ways to deal with DMA memory in
> non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> the same way in DMA memory is a smart choice. So using PTE attributes
> is more suitable.

I'm not saying it is a good idea. Just that apparently this exists in
the ASICs.

2021-05-22 00:39:02

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]> wrote:
>
> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
> >
> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > > Since the existing RISC-V ISA cannot solve this problem, it is better
> > > > to provide some configuration for the SOC vendor to customize.
> > >
> > > We've been talking about this problem for close to five years. So no,
> > > if you don't manage to get the feature into the ISA it can't be
> > > supported.
> >
> > Isn't it a good goal for Linux to support the capabilities present in
> > the SoC that a currently being fab'd?
> >
> > I believe the CMO group only started last year [1] so the RV64GC SoCs
> > that are going into mass production this year would not have had the
> > opporuntiy of utilizing any RISC-V ISA extension for handling cache
> > management.
>
> The current Linux RISC-V policy is to only accept patches for frozen or
> ratified ISA specs.
> (Refer, Documentation/riscv/patch-acceptance.rst)
>
> This means even if emulate CMO instructions in OpenSBI, the Linux
> patches won't be taken by Palmer because CMO specification is
> still in draft stage.
Before CMO specification release, could we use a sbi_ecall to solve
the current problem? This is not against the specification, when CMO
is ready we could let users choose to use the new CMO in Linux.

From a tech view, CMO trap emulation is the same as sbi_ecall.

>
> Also, we all know how much time it takes for RISCV international
> to freeze some spec. Judging by that we are looking at another
> 3-4 years at minimum.
>
> Regards,
> Anup



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-05-30 00:33:05

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]> wrote:
>>
>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
>> >
>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
>> > > > Since the existing RISC-V ISA cannot solve this problem, it is better
>> > > > to provide some configuration for the SOC vendor to customize.
>> > >
>> > > We've been talking about this problem for close to five years. So no,
>> > > if you don't manage to get the feature into the ISA it can't be
>> > > supported.
>> >
>> > Isn't it a good goal for Linux to support the capabilities present in
>> > the SoC that a currently being fab'd?
>> >
>> > I believe the CMO group only started last year [1] so the RV64GC SoCs
>> > that are going into mass production this year would not have had the
>> > opporuntiy of utilizing any RISC-V ISA extension for handling cache
>> > management.
>>
>> The current Linux RISC-V policy is to only accept patches for frozen or
>> ratified ISA specs.
>> (Refer, Documentation/riscv/patch-acceptance.rst)
>>
>> This means even if emulate CMO instructions in OpenSBI, the Linux
>> patches won't be taken by Palmer because CMO specification is
>> still in draft stage.
> Before CMO specification release, could we use a sbi_ecall to solve
> the current problem? This is not against the specification, when CMO
> is ready we could let users choose to use the new CMO in Linux.
>
> From a tech view, CMO trap emulation is the same as sbi_ecall.
>
>>
>> Also, we all know how much time it takes for RISCV international
>> to freeze some spec. Judging by that we are looking at another
>> 3-4 years at minimum.

Sorry for being slow here, this thread got buried.

I've been trying to work with a handful of folks at the RISC-V
foundation to try and get a subset of the various in-development
specifications (some simple CMOs, something about non-caching in the
page tables, and some way to prevent speculative accesse from generating
coherence traffic that will break non-coherent systems). I'm not sure
we can get this together quickly, but I'd prefer to at least try before
we jump to taking vendor-specificed behavior here. It's obviously an
up-hill battle to try and get specifications through the process and I'm
certainly not going to promise it will work, but I'm hoping that the
impending need to avoid forking the ISA will be sufficient to get people
behind producing some specifications in a timely fashion.

I wasn't aware than this chip had non-coherent devices until I saw this
thread, so we'd been mostly focused on the Beagle V chip. That was in a
sense an easier problem because the SiFive IP in it was never designed
to have non-coherent devices so we'd have to make anything work via a
series of slow workarounds, which would make emulating the eventually
standardized behavior reasonable in terms of performance (ie, everything
would be super slow so who really cares).

I don't think relying on some sort of SBI call for the CMOs whould be
such a performance hit that it would prevent these systems from being
viable, but assuming you have reasonable performance on your non-cached
accesses then that's probably not going to be viable to trap and
emulate. At that point it really just becomes silly to pretend that
we're still making things work by emulating the eventually ratified
behavior, as anyone who actually tries to use this thing to do IO would
need out of tree patches. I'm not sure exactly what the plan is for the
page table bits in the specification right now, but if you can give me a
pointer to some documentation then I'm happy to try and push for
something compatible.

If we can't make the process work at the foundation then I'd be strongly
in favor of just biting the bullet and starting to take vendor-specific
code that's been implemented in hardware and is necessarry to make
things work acceptably. That's obviously a sub-optimal solution as
it'll lead to a bunch of ISA fragmentation, but at least we'll be able
to keep the software stack together.

Can you tell us when these will be in the hands of users? That's pretty
important here, as I don't want to be blocking real users from having
their hardware work. IIRC there were some plans to distribute early
boards, but it looks like the foundation got involved and I guess I lost
the thread at that point.

Sorry this is all such a headache, but hopefully we can get things
sorted out.

>>
>> Regards,
>> Anup

2021-06-03 04:16:49

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
>> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]> wrote:
>>>
>>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini <[email protected]> wrote:
>>> >
>>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig wrote:
>>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
>>> > > > Since the existing RISC-V ISA cannot solve this problem, it is better
>>> > > > to provide some configuration for the SOC vendor to customize.
>>> > >
>>> > > We've been talking about this problem for close to five years. So no,
>>> > > if you don't manage to get the feature into the ISA it can't be
>>> > > supported.
>>> >
>>> > Isn't it a good goal for Linux to support the capabilities present in
>>> > the SoC that a currently being fab'd?
>>> >
>>> > I believe the CMO group only started last year [1] so the RV64GC SoCs
>>> > that are going into mass production this year would not have had the
>>> > opporuntiy of utilizing any RISC-V ISA extension for handling cache
>>> > management.
>>>
>>> The current Linux RISC-V policy is to only accept patches for frozen or
>>> ratified ISA specs.
>>> (Refer, Documentation/riscv/patch-acceptance.rst)
>>>
>>> This means even if emulate CMO instructions in OpenSBI, the Linux
>>> patches won't be taken by Palmer because CMO specification is
>>> still in draft stage.
>> Before CMO specification release, could we use a sbi_ecall to solve
>> the current problem? This is not against the specification, when CMO
>> is ready we could let users choose to use the new CMO in Linux.
>>
>> From a tech view, CMO trap emulation is the same as sbi_ecall.
>>
>>>
>>> Also, we all know how much time it takes for RISCV international
>>> to freeze some spec. Judging by that we are looking at another
>>> 3-4 years at minimum.
>
> Sorry for being slow here, this thread got buried.
>
> I've been trying to work with a handful of folks at the RISC-V
> foundation to try and get a subset of the various in-development
> specifications (some simple CMOs, something about non-caching in the
> page tables, and some way to prevent speculative accesse from generating
> coherence traffic that will break non-coherent systems). I'm not sure
> we can get this together quickly, but I'd prefer to at least try before
> we jump to taking vendor-specificed behavior here. It's obviously an
> up-hill battle to try and get specifications through the process and I'm
> certainly not going to promise it will work, but I'm hoping that the
> impending need to avoid forking the ISA will be sufficient to get people
> behind producing some specifications in a timely fashion.
>
> I wasn't aware than this chip had non-coherent devices until I saw this
> thread, so we'd been mostly focused on the Beagle V chip. That was in a
> sense an easier problem because the SiFive IP in it was never designed
> to have non-coherent devices so we'd have to make anything work via a
> series of slow workarounds, which would make emulating the eventually
> standardized behavior reasonable in terms of performance (ie, everything
> would be super slow so who really cares).
>
> I don't think relying on some sort of SBI call for the CMOs whould be
> such a performance hit that it would prevent these systems from being
> viable, but assuming you have reasonable performance on your non-cached
> accesses then that's probably not going to be viable to trap and
> emulate. At that point it really just becomes silly to pretend that
> we're still making things work by emulating the eventually ratified
> behavior, as anyone who actually tries to use this thing to do IO would
> need out of tree patches. I'm not sure exactly what the plan is for the
> page table bits in the specification right now, but if you can give me a
> pointer to some documentation then I'm happy to try and push for
> something compatible.
>
> If we can't make the process work at the foundation then I'd be strongly
> in favor of just biting the bullet and starting to take vendor-specific
> code that's been implemented in hardware and is necessarry to make
> things work acceptably. That's obviously a sub-optimal solution as
> it'll lead to a bunch of ISA fragmentation, but at least we'll be able
> to keep the software stack together.
>
> Can you tell us when these will be in the hands of users? That's pretty
> important here, as I don't want to be blocking real users from having
> their hardware work. IIRC there were some plans to distribute early
> boards, but it looks like the foundation got involved and I guess I lost
> the thread at that point.
>
> Sorry this is all such a headache, but hopefully we can get things
> sorted out.

I talked with some of the RISC-V foundation folks, we're not going to
have an ISA specification for the non-coherent stuff any time soon. I
took a look at this code and I definately don't want to take it as is,
but I'm not opposed to taking something that makes the hardware work as
long as it's a lot cleaner. We've already got two of these non-coherent
chips, I'm sure more will come, and I'd rather have the extra headaches
than make everyone fork the software stack.

After talking to Atish it looks like there's likely to be an SBI
extension to handle the CMOs, which should let us avoid the bulk of the
vendor-specific behavior in the kernel. I know some people are worried
about adding to the SBI surface. I'm worried about that too, but that's
way better than sticking a bunch of vendor-specific instructions into
the kernel. The SBI extension should make for a straight-forward cache
flush implementation in Linux, so let's just plan on that getting
through quickly (as has been done before).

Unfortunately we've yet to come up with a way to handle the
non-cacheable mappings without introducing a degree of vendor-specific
behavior or seriously impacting performance (mark them as not valid and
deal with them in the trap handler). I'm not really sure it counts as
supporting the hardware if it's massively slow, so that really leaves us
with vendor-specific mappings as the only option to make these chips
work.

This implementation, which adds some Kconfig entries that control page
table bits, definately isn't suitable for upstream. Allowing users to
set arbitrary page table bits will eventually conflict with the
standard, and is just going to be a mess. It'll also lead to kernels
that are only compatible with specific designs, which we're trying very
hard to avoid. At a bare minimum we'll need some way to detect systems
with these page table bits before setting them, and some description of
what the bits actually do so we can reason about them.

2021-06-03 06:02:41

by Anup Patel

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support



> -----Original Message-----
> From: Palmer Dabbelt <[email protected]>
> Sent: 03 June 2021 09:43
> To: [email protected]
> Cc: [email protected]; [email protected]; Christoph Hellwig
> <[email protected]>; Anup Patel <[email protected]>; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; Paul Walmsley
> <[email protected]>
> Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
>
> On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
> >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]>
> wrote:
> >>>
> >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> <[email protected]> wrote:
> >>> >
> >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
> wrote:
> >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is
> >>> > > > better to provide some configuration for the SOC vendor to
> customize.
> >>> > >
> >>> > > We've been talking about this problem for close to five years.
> >>> > > So no, if you don't manage to get the feature into the ISA it
> >>> > > can't be supported.
> >>> >
> >>> > Isn't it a good goal for Linux to support the capabilities present
> >>> > in the SoC that a currently being fab'd?
> >>> >
> >>> > I believe the CMO group only started last year [1] so the RV64GC
> >>> > SoCs that are going into mass production this year would not have
> >>> > had the opporuntiy of utilizing any RISC-V ISA extension for
> >>> > handling cache management.
> >>>
> >>> The current Linux RISC-V policy is to only accept patches for frozen
> >>> or ratified ISA specs.
> >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> >>>
> >>> This means even if emulate CMO instructions in OpenSBI, the Linux
> >>> patches won't be taken by Palmer because CMO specification is still
> >>> in draft stage.
> >> Before CMO specification release, could we use a sbi_ecall to solve
> >> the current problem? This is not against the specification, when CMO
> >> is ready we could let users choose to use the new CMO in Linux.
> >>
> >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> >>
> >>>
> >>> Also, we all know how much time it takes for RISCV international to
> >>> freeze some spec. Judging by that we are looking at another
> >>> 3-4 years at minimum.
> >
> > Sorry for being slow here, this thread got buried.
> >
> > I've been trying to work with a handful of folks at the RISC-V
> > foundation to try and get a subset of the various in-development
> > specifications (some simple CMOs, something about non-caching in the
> > page tables, and some way to prevent speculative accesse from
> > generating coherence traffic that will break non-coherent systems).
> > I'm not sure we can get this together quickly, but I'd prefer to at
> > least try before we jump to taking vendor-specificed behavior here.
> > It's obviously an up-hill battle to try and get specifications through
> > the process and I'm certainly not going to promise it will work, but
> > I'm hoping that the impending need to avoid forking the ISA will be
> > sufficient to get people behind producing some specifications in a timely
> fashion.
> >
> > I wasn't aware than this chip had non-coherent devices until I saw
> > this thread, so we'd been mostly focused on the Beagle V chip. That
> > was in a sense an easier problem because the SiFive IP in it was never
> > designed to have non-coherent devices so we'd have to make anything
> > work via a series of slow workarounds, which would make emulating the
> > eventually standardized behavior reasonable in terms of performance
> > (ie, everything would be super slow so who really cares).
> >
> > I don't think relying on some sort of SBI call for the CMOs whould be
> > such a performance hit that it would prevent these systems from being
> > viable, but assuming you have reasonable performance on your
> > non-cached accesses then that's probably not going to be viable to
> > trap and emulate. At that point it really just becomes silly to
> > pretend that we're still making things work by emulating the
> > eventually ratified behavior, as anyone who actually tries to use this
> > thing to do IO would need out of tree patches. I'm not sure exactly
> > what the plan is for the page table bits in the specification right
> > now, but if you can give me a pointer to some documentation then I'm
> > happy to try and push for something compatible.
> >
> > If we can't make the process work at the foundation then I'd be
> > strongly in favor of just biting the bullet and starting to take
> > vendor-specific code that's been implemented in hardware and is
> > necessarry to make things work acceptably. That's obviously a
> > sub-optimal solution as it'll lead to a bunch of ISA fragmentation,
> > but at least we'll be able to keep the software stack together.
> >
> > Can you tell us when these will be in the hands of users? That's
> > pretty important here, as I don't want to be blocking real users from
> > having their hardware work. IIRC there were some plans to distribute
> > early boards, but it looks like the foundation got involved and I
> > guess I lost the thread at that point.
> >
> > Sorry this is all such a headache, but hopefully we can get things
> > sorted out.
>
> I talked with some of the RISC-V foundation folks, we're not going to have an
> ISA specification for the non-coherent stuff any time soon. I took a look at
> this code and I definately don't want to take it as is, but I'm not opposed to
> taking something that makes the hardware work as long as it's a lot cleaner.
> We've already got two of these non-coherent chips, I'm sure more will come,
> and I'd rather have the extra headaches than make everyone fork the software
> stack.

Thanks for confirming. The CMO extension is still in early stages so it will
certainly take more time for them. After CMO extension is finalized, it will
take some more time to have actual RISC-V platforms with CMO implemented.

>
> After talking to Atish it looks like there's likely to be an SBI extension to
> handle the CMOs, which should let us avoid the bulk of the vendor-specific
> behavior in the kernel. I know some people are worried about adding to the
> SBI surface. I'm worried about that too, but that's way better than sticking a
> bunch of vendor-specific instructions into the kernel. The SBI extension
> should make for a straight-forward cache flush implementation in Linux, so
> let's just plan on that getting through quickly (as has been done before).

Yes, I agree. We can have just a single SBI call which is meant for DMA sync
purpose only which means it will flush/invalidate pages from all cache
levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The
CMO extension might more generic cache operations which can target
any cache level.

I am already preparing a write-up for SBI DMA sync call in SBI spec. I will
share it with you separately as well.

>
> Unfortunately we've yet to come up with a way to handle the non-cacheable
> mappings without introducing a degree of vendor-specific behavior or
> seriously impacting performance (mark them as not valid and deal with them
> in the trap handler). I'm not really sure it counts as supporting the hardware
> if it's massively slow, so that really leaves us with vendor-specific mappings as
> the only option to make these chips work.

A RISC-V platform can have non-cacheable mappings is following possible
ways:
1) Fixed physical address range as non-cacheable using PMAs
2) Custom page table attributes
3) Svpmbt extension being defined by RVI

Atish and me both think it is possible to have RISC-V specific DMA ops
implementation which can handle all above case. Atish is already working
on DMA ops implementation for RISC-V.

>
> This implementation, which adds some Kconfig entries that control page table
> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
> page table bits will eventually conflict with the standard, and is just going to
> be a mess. It'll also lead to kernels that are only compatible with specific
> designs, which we're trying very hard to avoid. At a bare minimum we'll need
> some way to detect systems with these page table bits before setting them,
> and some description of what the bits actually do so we can reason about
> them.

Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
goal of unified kernel image for all platforms.

Regards,
Anup

2021-06-03 15:41:23

by Palmer Dabbelt

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote:
>
>
>> -----Original Message-----
>> From: Palmer Dabbelt <[email protected]>
>> Sent: 03 June 2021 09:43
>> To: [email protected]
>> Cc: [email protected]; [email protected]; Christoph Hellwig
>> <[email protected]>; Anup Patel <[email protected]>; [email protected];
>> [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; Paul Walmsley
>> <[email protected]>
>> Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
>>
>> On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
>> > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
>> >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]>
>> wrote:
>> >>>
>> >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
>> <[email protected]> wrote:
>> >>> >
>> >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
>> wrote:
>> >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
>> >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is
>> >>> > > > better to provide some configuration for the SOC vendor to
>> customize.
>> >>> > >
>> >>> > > We've been talking about this problem for close to five years.
>> >>> > > So no, if you don't manage to get the feature into the ISA it
>> >>> > > can't be supported.
>> >>> >
>> >>> > Isn't it a good goal for Linux to support the capabilities present
>> >>> > in the SoC that a currently being fab'd?
>> >>> >
>> >>> > I believe the CMO group only started last year [1] so the RV64GC
>> >>> > SoCs that are going into mass production this year would not have
>> >>> > had the opporuntiy of utilizing any RISC-V ISA extension for
>> >>> > handling cache management.
>> >>>
>> >>> The current Linux RISC-V policy is to only accept patches for frozen
>> >>> or ratified ISA specs.
>> >>> (Refer, Documentation/riscv/patch-acceptance.rst)
>> >>>
>> >>> This means even if emulate CMO instructions in OpenSBI, the Linux
>> >>> patches won't be taken by Palmer because CMO specification is still
>> >>> in draft stage.
>> >> Before CMO specification release, could we use a sbi_ecall to solve
>> >> the current problem? This is not against the specification, when CMO
>> >> is ready we could let users choose to use the new CMO in Linux.
>> >>
>> >> From a tech view, CMO trap emulation is the same as sbi_ecall.
>> >>
>> >>>
>> >>> Also, we all know how much time it takes for RISCV international to
>> >>> freeze some spec. Judging by that we are looking at another
>> >>> 3-4 years at minimum.
>> >
>> > Sorry for being slow here, this thread got buried.
>> >
>> > I've been trying to work with a handful of folks at the RISC-V
>> > foundation to try and get a subset of the various in-development
>> > specifications (some simple CMOs, something about non-caching in the
>> > page tables, and some way to prevent speculative accesse from
>> > generating coherence traffic that will break non-coherent systems).
>> > I'm not sure we can get this together quickly, but I'd prefer to at
>> > least try before we jump to taking vendor-specificed behavior here.
>> > It's obviously an up-hill battle to try and get specifications through
>> > the process and I'm certainly not going to promise it will work, but
>> > I'm hoping that the impending need to avoid forking the ISA will be
>> > sufficient to get people behind producing some specifications in a timely
>> fashion.
>> >
>> > I wasn't aware than this chip had non-coherent devices until I saw
>> > this thread, so we'd been mostly focused on the Beagle V chip. That
>> > was in a sense an easier problem because the SiFive IP in it was never
>> > designed to have non-coherent devices so we'd have to make anything
>> > work via a series of slow workarounds, which would make emulating the
>> > eventually standardized behavior reasonable in terms of performance
>> > (ie, everything would be super slow so who really cares).
>> >
>> > I don't think relying on some sort of SBI call for the CMOs whould be
>> > such a performance hit that it would prevent these systems from being
>> > viable, but assuming you have reasonable performance on your
>> > non-cached accesses then that's probably not going to be viable to
>> > trap and emulate. At that point it really just becomes silly to
>> > pretend that we're still making things work by emulating the
>> > eventually ratified behavior, as anyone who actually tries to use this
>> > thing to do IO would need out of tree patches. I'm not sure exactly
>> > what the plan is for the page table bits in the specification right
>> > now, but if you can give me a pointer to some documentation then I'm
>> > happy to try and push for something compatible.
>> >
>> > If we can't make the process work at the foundation then I'd be
>> > strongly in favor of just biting the bullet and starting to take
>> > vendor-specific code that's been implemented in hardware and is
>> > necessarry to make things work acceptably. That's obviously a
>> > sub-optimal solution as it'll lead to a bunch of ISA fragmentation,
>> > but at least we'll be able to keep the software stack together.
>> >
>> > Can you tell us when these will be in the hands of users? That's
>> > pretty important here, as I don't want to be blocking real users from
>> > having their hardware work. IIRC there were some plans to distribute
>> > early boards, but it looks like the foundation got involved and I
>> > guess I lost the thread at that point.
>> >
>> > Sorry this is all such a headache, but hopefully we can get things
>> > sorted out.
>>
>> I talked with some of the RISC-V foundation folks, we're not going to have an
>> ISA specification for the non-coherent stuff any time soon. I took a look at
>> this code and I definately don't want to take it as is, but I'm not opposed to
>> taking something that makes the hardware work as long as it's a lot cleaner.
>> We've already got two of these non-coherent chips, I'm sure more will come,
>> and I'd rather have the extra headaches than make everyone fork the software
>> stack.
>
> Thanks for confirming. The CMO extension is still in early stages so it will
> certainly take more time for them. After CMO extension is finalized, it will
> take some more time to have actual RISC-V platforms with CMO implemented.

Agreed. It's going to take two or three years from the standard to get
hardware to supporting it, so that means we're three or four years away
(at least, there's not even any solid timeline for a spec a year from
now) from having hardware. There's just going to be too much
non-standard hardware here to try to ignore it all.

>> After talking to Atish it looks like there's likely to be an SBI extension to
>> handle the CMOs, which should let us avoid the bulk of the vendor-specific
>> behavior in the kernel. I know some people are worried about adding to the
>> SBI surface. I'm worried about that too, but that's way better than sticking a
>> bunch of vendor-specific instructions into the kernel. The SBI extension
>> should make for a straight-forward cache flush implementation in Linux, so
>> let's just plan on that getting through quickly (as has been done before).
>
> Yes, I agree. We can have just a single SBI call which is meant for DMA sync
> purpose only which means it will flush/invalidate pages from all cache
> levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The
> CMO extension might more generic cache operations which can target
> any cache level.
>
> I am already preparing a write-up for SBI DMA sync call in SBI spec. I will
> share it with you separately as well.

Great, thanks. Atish sort of mentioned that, but I didn't want to put
words in your mouth (and I assume you were aleep or something, due to
time zones).

>> Unfortunately we've yet to come up with a way to handle the non-cacheable
>> mappings without introducing a degree of vendor-specific behavior or
>> seriously impacting performance (mark them as not valid and deal with them
>> in the trap handler). I'm not really sure it counts as supporting the hardware
>> if it's massively slow, so that really leaves us with vendor-specific mappings as
>> the only option to make these chips work.
>
> A RISC-V platform can have non-cacheable mappings is following possible
> ways:
> 1) Fixed physical address range as non-cacheable using PMAs
> 2) Custom page table attributes
> 3) Svpmbt extension being defined by RVI
>
> Atish and me both think it is possible to have RISC-V specific DMA ops
> implementation which can handle all above case. Atish is already working
> on DMA ops implementation for RISC-V.

Great, thanks. I haven't started writing any code, but I think we're
going to be able to get a big chunk of #1 from the "dma-ranges" device
tree stuff. I think we still need some arch-specific allocation work to
make sure we don't alias, though.

The page table attributes are definately going to need dma ops. I'd
been assuming we'd have multiple DMA op tables for the multiple flavors,
but if they fit into a single op table cleanly that's fine -- that sort
of stuff really needs the code here.

Since you guys have already started I'll just wait for patches.

Thanks!

>> This implementation, which adds some Kconfig entries that control page table
>> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
>> page table bits will eventually conflict with the standard, and is just going to
>> be a mess. It'll also lead to kernels that are only compatible with specific
>> designs, which we're trying very hard to avoid. At a bare minimum we'll need
>> some way to detect systems with these page table bits before setting them,
>> and some description of what the bits actually do so we can reason about
>> them.
>
> Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
> goal of unified kernel image for all platforms.

I think this is just a phrasing issue, but just to be sure:

IMO it's not that they're vendor-specific Kconfig options, it's that
turning them on will conflict with standard systems (and other vendors).
We've already got the ability to select sets of Kconfig settings that
will only boot on one vendor's system, which is fine, as long as there
remains a set of Kconfig settings that will boot on all systems.

An example here would be the errata: every system has errata of some
sort, so if we start flipping off various vendor's errata Kconfigs
you'll end up with kernels that only function properly on some systems.
That's fine with me, as long as it's possible to turn on all vendor's
errata Kconfigs at the same time and the resulting kernel functions
correctly on all systems.

> Regards,
> Anup

2021-06-04 09:05:51

by David Laight

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: Palmer Dabbelt
> Sent: 03 June 2021 16:39
...
> An example here would be the errata: every system has errata of some
> sort, so if we start flipping off various vendor's errata Kconfigs
> you'll end up with kernels that only function properly on some systems.
> That's fine with me, as long as it's possible to turn on all vendor's
> errata Kconfigs at the same time and the resulting kernel functions
> correctly on all systems.

ISTM that if you can (easily) detect the errata then the detection
should be left it - but the kernel fail to boot if the system
needs the errata fixed.

The same would be needed for DMA in systems with non-coherent memory.

Only a hardware engineer would build a system with non-coherent memory
and without the ability to do uncached accesses and flush/invalidate
small sections of cache.

Mind you we did get a dual-cpu system that didn't have cache-coherency
between the cpus! That was singularly useless.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-06-04 09:59:25

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <[email protected]> wrote:
> On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote:
> >> This implementation, which adds some Kconfig entries that control page table
> >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
> >> page table bits will eventually conflict with the standard, and is just going to
> >> be a mess. It'll also lead to kernels that are only compatible with specific
> >> designs, which we're trying very hard to avoid. At a bare minimum we'll need
> >> some way to detect systems with these page table bits before setting them,
> >> and some description of what the bits actually do so we can reason about
> >> them.
> >
> > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
> > goal of unified kernel image for all platforms.
>
> I think this is just a phrasing issue, but just to be sure:
>
> IMO it's not that they're vendor-specific Kconfig options, it's that
> turning them on will conflict with standard systems (and other vendors).
> We've already got the ability to select sets of Kconfig settings that
> will only boot on one vendor's system, which is fine, as long as there
> remains a set of Kconfig settings that will boot on all systems.
>
> An example here would be the errata: every system has errata of some
> sort, so if we start flipping off various vendor's errata Kconfigs
> you'll end up with kernels that only function properly on some systems.
> That's fine with me, as long as it's possible to turn on all vendor's
> errata Kconfigs at the same time and the resulting kernel functions
> correctly on all systems.

Yes, this is generally the goal, and it would be great to have that
working in a way where a 'defconfig' build just turns on all the options
that are needed to use any SoC specific features and drivers while
still working on all hardware. There are however limits you may run
into at some point, and other architectures usually only manage to span
some 10 to 15 years of hardware implementations with a single
kernel before it get really hard.

To give some common examples that make it break down:

- 32-bit vs 64-bit already violates that rule on risc-v (as it does on
most other architectures)

- architectures that support both big-endian and little-endian kernels
tend to have platforms that require one or the other (e.g. mips,
though not arm). Not an issue for you.

- page table formats are the main cause of incompatibility: arm32
and x86-32 require three-level tables for certain features, but those
are incompatible with older cores, arm64 supports three different
page sizes, but none of them works on all cores (4KB almost works
everywhere).

- SMP-enabled ARMv7 kernels can be configured to run on either
ARMv6 or ARMv8, but not both, in this case because of incompatible
barrier instructions.

- 32-bit Arm has a couple more remaining features that require building
a machine specific kernel if enabled because they hardcode physical
addresses: early printk (debug_ll, not the normal earlycon), NOMMU,
and XIP.

Arnd

2021-06-04 14:48:54

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Hi Arnd & Palmer,

Sorry for the delayed reply, I'm working on the next version of the patch.

On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <[email protected]> wrote:
> > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote:
> > >> This implementation, which adds some Kconfig entries that control page table
> > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
> > >> page table bits will eventually conflict with the standard, and is just going to
> > >> be a mess. It'll also lead to kernels that are only compatible with specific
> > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need
> > >> some way to detect systems with these page table bits before setting them,
> > >> and some description of what the bits actually do so we can reason about
> > >> them.
> > >
> > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
> > > goal of unified kernel image for all platforms.
Okay, Agree. Please help review the next version of the patch.

> >
> > I think this is just a phrasing issue, but just to be sure:
> >
> > IMO it's not that they're vendor-specific Kconfig options, it's that
> > turning them on will conflict with standard systems (and other vendors).
> > We've already got the ability to select sets of Kconfig settings that
> > will only boot on one vendor's system, which is fine, as long as there
> > remains a set of Kconfig settings that will boot on all systems.
> >
> > An example here would be the errata: every system has errata of some
> > sort, so if we start flipping off various vendor's errata Kconfigs
> > you'll end up with kernels that only function properly on some systems.
> > That's fine with me, as long as it's possible to turn on all vendor's
> > errata Kconfigs at the same time and the resulting kernel functions
> > correctly on all systems.
>
> Yes, this is generally the goal, and it would be great to have that
> working in a way where a 'defconfig' build just turns on all the options
> that are needed to use any SoC specific features and drivers while
> still working on all hardware. There are however limits you may run
> into at some point, and other architectures usually only manage to span
> some 10 to 15 years of hardware implementations with a single
> kernel before it get really hard.
I could follow the goal in the next version of the patchset. Please
help review, thx.

>
> To give some common examples that make it break down:
>
> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on
> most other architectures)
>
> - architectures that support both big-endian and little-endian kernels
> tend to have platforms that require one or the other (e.g. mips,
> though not arm). Not an issue for you.
>
> - page table formats are the main cause of incompatibility: arm32
> and x86-32 require three-level tables for certain features, but those
> are incompatible with older cores, arm64 supports three different
> page sizes, but none of them works on all cores (4KB almost works
> everywhere).
>
> - SMP-enabled ARMv7 kernels can be configured to run on either
> ARMv6 or ARMv8, but not both, in this case because of incompatible
> barrier instructions.
>
> - 32-bit Arm has a couple more remaining features that require building
> a machine specific kernel if enabled because they hardcode physical
> addresses: early printk (debug_ll, not the normal earlycon), NOMMU,
> and XIP.
>
> Arnd



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-04 16:15:02

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Fri, 04 Jun 2021 07:47:22 PDT (-0700), [email protected] wrote:
> Hi Arnd & Palmer,
>
> Sorry for the delayed reply, I'm working on the next version of the patch.
>
> On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <[email protected]> wrote:
>>
>> On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <[email protected]> wrote:
>> > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote:
>> > >> This implementation, which adds some Kconfig entries that control page table
>> > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
>> > >> page table bits will eventually conflict with the standard, and is just going to
>> > >> be a mess. It'll also lead to kernels that are only compatible with specific
>> > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need
>> > >> some way to detect systems with these page table bits before setting them,
>> > >> and some description of what the bits actually do so we can reason about
>> > >> them.
>> > >
>> > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
>> > > goal of unified kernel image for all platforms.
> Okay, Agree. Please help review the next version of the patch.
>
>> >
>> > I think this is just a phrasing issue, but just to be sure:
>> >
>> > IMO it's not that they're vendor-specific Kconfig options, it's that
>> > turning them on will conflict with standard systems (and other vendors).
>> > We've already got the ability to select sets of Kconfig settings that
>> > will only boot on one vendor's system, which is fine, as long as there
>> > remains a set of Kconfig settings that will boot on all systems.
>> >
>> > An example here would be the errata: every system has errata of some
>> > sort, so if we start flipping off various vendor's errata Kconfigs
>> > you'll end up with kernels that only function properly on some systems.
>> > That's fine with me, as long as it's possible to turn on all vendor's
>> > errata Kconfigs at the same time and the resulting kernel functions
>> > correctly on all systems.
>>
>> Yes, this is generally the goal, and it would be great to have that
>> working in a way where a 'defconfig' build just turns on all the options
>> that are needed to use any SoC specific features and drivers while
>> still working on all hardware. There are however limits you may run
>> into at some point, and other architectures usually only manage to span
>> some 10 to 15 years of hardware implementations with a single
>> kernel before it get really hard.
> I could follow the goal in the next version of the patchset. Please
> help review, thx.

IMO we're essentially here now with the RISC-V stuff: defconfig flips on
everything necesasry to boot normal-smelling SOCs, with everything being
detected as the system boots. We have some wacky configurations like
!MMU and XIP that are coupled to the hardware, but (and sorry for
crossing the other threads, I missed your pointer as it's early here) as
I said in the other thread it might be time to make it explicit that
those things are non-portable.

The hope here has always been that we'd have enough in the standards
that we could avoid a proliferation of vendor-specific code. We've
always put a strong "things keep working forever" stake in the ground in
RISC-V land, but that's largely been because we were counting on the
standards existing that make support easy. In practice we don't have
those standards so we're ending up with a fairly large software base
that is required to support everything. We don't have all that much
hardware right now so we'll have to see how it goes, but for now I'm in
favor of keeping defconfig as a "boots on everything" sort of setup --
both because it makes life easier for users, and because it makes issues
like the non-portable Kconfigs that showed up here quite explicit.

If we get to 10/15 years of hardware then I'm sure we'll be removing old
systems from defconfig (or maybe even the kernel entirely, a lot of this
stuff isn't in production). I'm just hoping we make it that far ;)

>> To give some common examples that make it break down:
>>
>> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on
>> most other architectures)

Yes, and there's no way around that on RISC-V. They're different base
ISAs therefor re-define the same instructions, so we're essentially at
two kernel binaries by that point. The platform spec says rv64gc, so we
can kind of punt on this one for now. If rv32 hardware shows up
we'll probably want a standard system there too, which is why we've
avoided coupling kernel portability to XLEN.

>> - architectures that support both big-endian and little-endian kernels
>> tend to have platforms that require one or the other (e.g. mips,
>> though not arm). Not an issue for you.

It is now! We've added big-endian to RISC-V. There's no hardware yet
and very little software support. IMO the right answer is to ban that
from the platform spec, but again it'll depnd on what vendors want to
build (though anyone is listening, please don't make my life miserable
;)).

>> - page table formats are the main cause of incompatibility: arm32
>> and x86-32 require three-level tables for certain features, but those
>> are incompatible with older cores, arm64 supports three different
>> page sizes, but none of them works on all cores (4KB almost works
>> everywhere).

We actually have some support on the works for multiple page table
levels in a single binary, which should help with a lot of that
incompatibility. I don't know of any plans to couple other page table
features to the number of levels, though.

>> - SMP-enabled ARMv7 kernels can be configured to run on either
>> ARMv6 or ARMv8, but not both, in this case because of incompatible
>> barrier instructions.

Our barriers aren't quite split the same way, but we do have two memory
models (RVWMO and TSO). IIUC we should be able to support both in the
same kernels with some patching, but the resulting kernels would be
biased towards one memory models over the other WRT performance. Again,
we'll have to see what the vendors do and I'm hoping we don't end up
with too many headaches.

>> - 32-bit Arm has a couple more remaining features that require building
>> a machine specific kernel if enabled because they hardcode physical
>> addresses: early printk (debug_ll, not the normal earlycon), NOMMU,
>> and XIP.

We've got NOMMU and XIP as well, but we have some SBI support for early
printk. IMO we're not really sure if we've decoupled all the PA layout
dependencies yet from Linux, as we really only support one vendor's
systems, but we've had a lot of work lately on beefing up our memory
layout so with any luck we'll be able to quickly sort out anything that
comes up.

2021-06-04 21:29:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Fri, Jun 4, 2021 at 6:14 PM Palmer Dabbelt <[email protected]> wrote:

> The hope here has always been that we'd have enough in the standards
> that we could avoid a proliferation of vendor-specific code. We've
> always put a strong "things keep working forever" stake in the ground in
> RISC-V land, but that's largely been because we were counting on the
> standards existing that make support easy. In practice we don't have
> those standards so we're ending up with a fairly large software base
> that is required to support everything. We don't have all that much
> hardware right now so we'll have to see how it goes, but for now I'm in
> favor of keeping defconfig as a "boots on everything" sort of setup --
> both because it makes life easier for users, and because it makes issues
> like the non-portable Kconfigs that showed up here quite explicit.

It's obviously easy to take the hard line approach as long as there is
so little hardware available. I expect this to be a constant struggle,
but it's definitely worth trying as long as you can.

> >> To give some common examples that make it break down:
> >>
> >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on
> >> most other architectures)
>
> Yes, and there's no way around that on RISC-V. They're different base
> ISAs therefor re-define the same instructions, so we're essentially at
> two kernel binaries by that point. The platform spec says rv64gc, so we
> can kind of punt on this one for now. If rv32 hardware shows up
> we'll probably want a standard system there too, which is why we've
> avoided coupling kernel portability to XLEN.

I would actually put 32-bit into the same category as NOMMU, XIP
and the built-in DTB:
Since it seems unrealistic to expect an rv32 Debian or Fedora build,
there is very little to gain by enforcing compatibility between machines.
This is different from 32-bit Arm, which is widely used across multiple
distros and many SoCs.

> >> - architectures that support both big-endian and little-endian kernels
> >> tend to have platforms that require one or the other (e.g. mips,
> >> though not arm). Not an issue for you.
>
> It is now! We've added big-endian to RISC-V. There's no hardware yet
> and very little software support. IMO the right answer is to ban that
> from the platform spec, but again it'll depnd on what vendors want to
> build (though anyone is listening, please don't make my life miserable
> ;)).

I don't see any big-endian support in linux-next. This one does seem
worth enforcing to be kept out, as it would double the number of user
space ABIs, not just kernel configurations. On arm64, I think the general
feeling is now that we would have been better off not merging big-endian
support into the kernel as an option, but it still seemed important at the
time. Not that there is anything really wrong with big-endian by itself,
just that there is no use case that is worth the added complexity of
supporting both.

Let me know if there are patches you want me to Nak in the future ;-)

> >> - SMP-enabled ARMv7 kernels can be configured to run on either
> >> ARMv6 or ARMv8, but not both, in this case because of incompatible
> >> barrier instructions.
>
> Our barriers aren't quite split the same way, but we do have two memory
> models (RVWMO and TSO). IIUC we should be able to support both in the
> same kernels with some patching, but the resulting kernels would be
> biased towards one memory models over the other WRT performance. Again,
> we'll have to see what the vendors do and I'm hoping we don't end up
> with too many headaches.

I wouldn't specifically expect the problem to be barriers in the rv64 case,
this was just an example of instruction sets slowly changing in incompatible
ways over a long time. There might be an important reason for version 3.0
of one of the specifications to drop compatibility with version 1.x.

Arnd

2021-06-04 22:15:06

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Fri, 04 Jun 2021 14:26:11 PDT (-0700), Arnd Bergmann wrote:
> On Fri, Jun 4, 2021 at 6:14 PM Palmer Dabbelt <[email protected]> wrote:
>> >> To give some common examples that make it break down:
>> >>
>> >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on
>> >> most other architectures)
>>
>> Yes, and there's no way around that on RISC-V. They're different base
>> ISAs therefor re-define the same instructions, so we're essentially at
>> two kernel binaries by that point. The platform spec says rv64gc, so we
>> can kind of punt on this one for now. If rv32 hardware shows up
>> we'll probably want a standard system there too, which is why we've
>> avoided coupling kernel portability to XLEN.
>
> I would actually put 32-bit into the same category as NOMMU, XIP
> and the built-in DTB:
> Since it seems unrealistic to expect an rv32 Debian or Fedora build,
> there is very little to gain by enforcing compatibility between machines.
> This is different from 32-bit Arm, which is widely used across multiple
> distros and many SoCs.

OK, well, that's what the spec says already. Maybe the right answer is
to just add that "be compatible with the platform spec" Kconfig and have
it also enforce rv64gc like the spec says.

>
>> >> - architectures that support both big-endian and little-endian kernels
>> >> tend to have platforms that require one or the other (e.g. mips,
>> >> though not arm). Not an issue for you.
>>
>> It is now! We've added big-endian to RISC-V. There's no hardware yet
>> and very little software support. IMO the right answer is to ban that
>> from the platform spec, but again it'll depnd on what vendors want to
>> build (though anyone is listening, please don't make my life miserable
>> ;)).
>
> I don't see any big-endian support in linux-next. This one does seem
> worth enforcing to be kept out, as it would double the number of user
> space ABIs, not just kernel configurations. On arm64, I think the general
> feeling is now that we would have been better off not merging big-endian
> support into the kernel as an option, but it still seemed important at the
> time. Not that there is anything really wrong with big-endian by itself,
> just that there is no use case that is worth the added complexity of
> supporting both.
>
> Let me know if there are patches you want me to Nak in the future ;-)

Sorry, by "added big-endian to RISC-V" I meant to the ISA, not to Linux.
We haven't had any interesting in adding it to Linux. The interest has
all been in the embedded space.

2021-06-06 17:13:31

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Hi Anup and Atish,

On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Palmer Dabbelt <[email protected]>
> > Sent: 03 June 2021 09:43
> > To: [email protected]
> > Cc: [email protected]; [email protected]; Christoph Hellwig
> > <[email protected]>; Anup Patel <[email protected]>; [email protected];
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; Paul Walmsley
> > <[email protected]>
> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> >
> > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
> > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]>
> > wrote:
> > >>>
> > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > <[email protected]> wrote:
> > >>> >
> > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
> > wrote:
> > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > >>> > > > Since the existing RISC-V ISA cannot solve this problem, it is
> > >>> > > > better to provide some configuration for the SOC vendor to
> > customize.
> > >>> > >
> > >>> > > We've been talking about this problem for close to five years.
> > >>> > > So no, if you don't manage to get the feature into the ISA it
> > >>> > > can't be supported.
> > >>> >
> > >>> > Isn't it a good goal for Linux to support the capabilities present
> > >>> > in the SoC that a currently being fab'd?
> > >>> >
> > >>> > I believe the CMO group only started last year [1] so the RV64GC
> > >>> > SoCs that are going into mass production this year would not have
> > >>> > had the opporuntiy of utilizing any RISC-V ISA extension for
> > >>> > handling cache management.
> > >>>
> > >>> The current Linux RISC-V policy is to only accept patches for frozen
> > >>> or ratified ISA specs.
> > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > >>>
> > >>> This means even if emulate CMO instructions in OpenSBI, the Linux
> > >>> patches won't be taken by Palmer because CMO specification is still
> > >>> in draft stage.
> > >> Before CMO specification release, could we use a sbi_ecall to solve
> > >> the current problem? This is not against the specification, when CMO
> > >> is ready we could let users choose to use the new CMO in Linux.
> > >>
> > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > >>
> > >>>
> > >>> Also, we all know how much time it takes for RISCV international to
> > >>> freeze some spec. Judging by that we are looking at another
> > >>> 3-4 years at minimum.
> > >
> > > Sorry for being slow here, this thread got buried.
> > >
> > > I've been trying to work with a handful of folks at the RISC-V
> > > foundation to try and get a subset of the various in-development
> > > specifications (some simple CMOs, something about non-caching in the
> > > page tables, and some way to prevent speculative accesse from
> > > generating coherence traffic that will break non-coherent systems).
> > > I'm not sure we can get this together quickly, but I'd prefer to at
> > > least try before we jump to taking vendor-specificed behavior here.
> > > It's obviously an up-hill battle to try and get specifications through
> > > the process and I'm certainly not going to promise it will work, but
> > > I'm hoping that the impending need to avoid forking the ISA will be
> > > sufficient to get people behind producing some specifications in a timely
> > fashion.
> > >
> > > I wasn't aware than this chip had non-coherent devices until I saw
> > > this thread, so we'd been mostly focused on the Beagle V chip. That
> > > was in a sense an easier problem because the SiFive IP in it was never
> > > designed to have non-coherent devices so we'd have to make anything
> > > work via a series of slow workarounds, which would make emulating the
> > > eventually standardized behavior reasonable in terms of performance
> > > (ie, everything would be super slow so who really cares).
> > >
> > > I don't think relying on some sort of SBI call for the CMOs whould be
> > > such a performance hit that it would prevent these systems from being
> > > viable, but assuming you have reasonable performance on your
> > > non-cached accesses then that's probably not going to be viable to
> > > trap and emulate. At that point it really just becomes silly to
> > > pretend that we're still making things work by emulating the
> > > eventually ratified behavior, as anyone who actually tries to use this
> > > thing to do IO would need out of tree patches. I'm not sure exactly
> > > what the plan is for the page table bits in the specification right
> > > now, but if you can give me a pointer to some documentation then I'm
> > > happy to try and push for something compatible.
> > >
> > > If we can't make the process work at the foundation then I'd be
> > > strongly in favor of just biting the bullet and starting to take
> > > vendor-specific code that's been implemented in hardware and is
> > > necessarry to make things work acceptably. That's obviously a
> > > sub-optimal solution as it'll lead to a bunch of ISA fragmentation,
> > > but at least we'll be able to keep the software stack together.
> > >
> > > Can you tell us when these will be in the hands of users? That's
> > > pretty important here, as I don't want to be blocking real users from
> > > having their hardware work. IIRC there were some plans to distribute
> > > early boards, but it looks like the foundation got involved and I
> > > guess I lost the thread at that point.
> > >
> > > Sorry this is all such a headache, but hopefully we can get things
> > > sorted out.
> >
> > I talked with some of the RISC-V foundation folks, we're not going to have an
> > ISA specification for the non-coherent stuff any time soon. I took a look at
> > this code and I definately don't want to take it as is, but I'm not opposed to
> > taking something that makes the hardware work as long as it's a lot cleaner.
> > We've already got two of these non-coherent chips, I'm sure more will come,
> > and I'd rather have the extra headaches than make everyone fork the software
> > stack.
>
> Thanks for confirming. The CMO extension is still in early stages so it will
> certainly take more time for them. After CMO extension is finalized, it will
> take some more time to have actual RISC-V platforms with CMO implemented.
>
> >
> > After talking to Atish it looks like there's likely to be an SBI extension to
> > handle the CMOs, which should let us avoid the bulk of the vendor-specific
> > behavior in the kernel. I know some people are worried about adding to the
> > SBI surface. I'm worried about that too, but that's way better than sticking a
> > bunch of vendor-specific instructions into the kernel. The SBI extension
> > should make for a straight-forward cache flush implementation in Linux, so
> > let's just plan on that getting through quickly (as has been done before).
>
> Yes, I agree. We can have just a single SBI call which is meant for DMA sync
> purpose only which means it will flush/invalidate pages from all cache
> levels irrespective of the cache hierarchy (i.e. flush/invalidate to RAM). The
> CMO extension might more generic cache operations which can target
> any cache level.
>
> I am already preparing a write-up for SBI DMA sync call in SBI spec. I will
> share it with you separately as well.
>
> >
> > Unfortunately we've yet to come up with a way to handle the non-cacheable
> > mappings without introducing a degree of vendor-specific behavior or
> > seriously impacting performance (mark them as not valid and deal with them
> > in the trap handler). I'm not really sure it counts as supporting the hardware
> > if it's massively slow, so that really leaves us with vendor-specific mappings as
> > the only option to make these chips work.
>
> A RISC-V platform can have non-cacheable mappings is following possible
> ways:
> 1) Fixed physical address range as non-cacheable using PMAs
> 2) Custom page table attributes
> 3) Svpmbt extension being defined by RVI
>
> Atish and me both think it is possible to have RISC-V specific DMA ops
> implementation which can handle all above case. Atish is already working
> on DMA ops implementation for RISC-V.
Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please
have a look at:
https://lore.kernel.org/linux-riscv/[email protected]/T/#u


>
> >
> > This implementation, which adds some Kconfig entries that control page table
> > bits, definately isn't suitable for upstream. Allowing users to set arbitrary
> > page table bits will eventually conflict with the standard, and is just going to
> > be a mess. It'll also lead to kernels that are only compatible with specific
> > designs, which we're trying very hard to avoid. At a bare minimum we'll need
> > some way to detect systems with these page table bits before setting them,
> > and some description of what the bits actually do so we can reason about
> > them.
>
> Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
> goal of unified kernel image for all platforms.
>
> Regards,
> Anup



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-06 18:18:16

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Στις 2021-05-20 04:45, Guo Ren έγραψε:
> On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
>>
>> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
>> > This patch series looks like it might be useful for the StarFive JH7100
>> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
>> > USB and SDIO require that the L2 cache must be manually flushed after
>> > DMA operations if the data is intended to be shared with U74 cores [2].
>>
>> Not too much, given that the SiFive lineage CPUs have an uncached
>> window, that is a totally different way to allocate uncached memory.
> It's a very big MIPS smell. What's the attribute of the uncached
> window? (uncached + strong-order/ uncached + weak, most vendors still
> use AXI interconnect, how to deal with a bufferable attribute?) In
> fact, customers' drivers use different ways to deal with DMA memory in
> non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> the same way in DMA memory is a smart choice. So using PTE attributes
> is more suitable.
>
> See:
> https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
> 4.4.1
> The draft supports custom attribute bits in PTE.
>

Not only it doesn't support custom attributes on PTEs:

"Bits63–54 are reserved for future standard use and must be zeroed by
software for forward compatibility."

It also goes further to say that:

"if any of these bits are set, a page-fault exception is raised"

2021-06-07 00:07:45

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]> wrote:
>
> Στις 2021-05-20 04:45, Guo Ren έγραψε:
> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
> >>
> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> >> > This patch series looks like it might be useful for the StarFive JH7100
> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> >> > USB and SDIO require that the L2 cache must be manually flushed after
> >> > DMA operations if the data is intended to be shared with U74 cores [2].
> >>
> >> Not too much, given that the SiFive lineage CPUs have an uncached
> >> window, that is a totally different way to allocate uncached memory.
> > It's a very big MIPS smell. What's the attribute of the uncached
> > window? (uncached + strong-order/ uncached + weak, most vendors still
> > use AXI interconnect, how to deal with a bufferable attribute?) In
> > fact, customers' drivers use different ways to deal with DMA memory in
> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> > the same way in DMA memory is a smart choice. So using PTE attributes
> > is more suitable.
> >
> > See:
> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
> > 4.4.1
> > The draft supports custom attribute bits in PTE.
> >
>
> Not only it doesn't support custom attributes on PTEs:
>
> "Bits63–54 are reserved for future standard use and must be zeroed by
> software for forward compatibility."
>
> It also goes further to say that:
>
> "if any of these bits are set, a page-fault exception is raised"

In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to
support it.

Let Linux support custom PTE attributes won't get any side effect in practice.

IMO:
We needn't waste a bit in PTE, but the custom idea in PTE reserved
bits is necessary. Because Allwinner D1 needs custom PTE bits in
reserved bits to work around.
So I recommend just remove the "C" bit in PTE, but allow vendors to
define their own PTE attributes in reserved bits. I've found a way to
compact different PTE attributes of different vendors during the Linux
boot stage. That means we still could use One Image for all vendors in
Linux



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 02:19:38

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Στις 2021-06-07 03:04, Guo Ren έγραψε:
> On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]>
> wrote:
>>
>> Στις 2021-05-20 04:45, Guo Ren έγραψε:
>> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
>> >>
>> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
>> >> > This patch series looks like it might be useful for the StarFive JH7100
>> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
>> >> > USB and SDIO require that the L2 cache must be manually flushed after
>> >> > DMA operations if the data is intended to be shared with U74 cores [2].
>> >>
>> >> Not too much, given that the SiFive lineage CPUs have an uncached
>> >> window, that is a totally different way to allocate uncached memory.
>> > It's a very big MIPS smell. What's the attribute of the uncached
>> > window? (uncached + strong-order/ uncached + weak, most vendors still
>> > use AXI interconnect, how to deal with a bufferable attribute?) In
>> > fact, customers' drivers use different ways to deal with DMA memory in
>> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
>> > the same way in DMA memory is a smart choice. So using PTE attributes
>> > is more suitable.
>> >
>> > See:
>> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
>> > 4.4.1
>> > The draft supports custom attribute bits in PTE.
>> >
>>
>> Not only it doesn't support custom attributes on PTEs:
>>
>> "Bits63–54 are reserved for future standard use and must be zeroed by
>> software for forward compatibility."
>>
>> It also goes further to say that:
>>
>> "if any of these bits are set, a page-fault exception is raised"
>
> In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to
> support it.
>

The C-bit was recently dropped, there is a new proposal for Page Based
Memory Attributes (PBMT) that we can work on / push for.

> Let Linux support custom PTE attributes won't get any side effect in
> practice.
>
> IMO:
> We needn't waste a bit in PTE, but the custom idea in PTE reserved
> bits is necessary. Because Allwinner D1 needs custom PTE bits in
> reserved bits to work around.
> So I recommend just remove the "C" bit in PTE, but allow vendors to
> define their own PTE attributes in reserved bits. I've found a way to
> compact different PTE attributes of different vendors during the Linux
> boot stage. That means we still could use One Image for all vendors in
> Linux

The spec is clear, those attributes are for standard use only, not for
custom/platform use. It's one thing to implement custom CMOs where the
ISA doesn't have anything for it and doesn't prevent you to do so (so
you are not violating anything, it's just a custom extension), and we
can hide them behind SBI calls etc, and another to violate the current
Privilege Spec by using bits on PTEs that are reserved for standard use
only. The intentions of the VM TG are clear, not only those bits are
reserved but if software uses them the hw will result a page fault in
future revisions of the spec. What's the idea here, to support
non-compliant implementations on mainline ? I'm sure you have a good
idea on how to make this work, but as long as it violates the spec it
can't go in IMHO.

2021-06-07 03:23:31

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 10:16 AM Nick Kossifidis <[email protected]> wrote:
>
> Στις 2021-06-07 03:04, Guo Ren έγραψε:
> > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]>
> > wrote:
> >>
> >> Στις 2021-05-20 04:45, Guo Ren έγραψε:
> >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
> >> >>
> >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> >> >> > This patch series looks like it might be useful for the StarFive JH7100
> >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> >> >> > USB and SDIO require that the L2 cache must be manually flushed after
> >> >> > DMA operations if the data is intended to be shared with U74 cores [2].
> >> >>
> >> >> Not too much, given that the SiFive lineage CPUs have an uncached
> >> >> window, that is a totally different way to allocate uncached memory.
> >> > It's a very big MIPS smell. What's the attribute of the uncached
> >> > window? (uncached + strong-order/ uncached + weak, most vendors still
> >> > use AXI interconnect, how to deal with a bufferable attribute?) In
> >> > fact, customers' drivers use different ways to deal with DMA memory in
> >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> >> > the same way in DMA memory is a smart choice. So using PTE attributes
> >> > is more suitable.
> >> >
> >> > See:
> >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
> >> > 4.4.1
> >> > The draft supports custom attribute bits in PTE.
> >> >
> >>
> >> Not only it doesn't support custom attributes on PTEs:
> >>
> >> "Bits63–54 are reserved for future standard use and must be zeroed by
> >> software for forward compatibility."
> >>
> >> It also goes further to say that:
> >>
> >> "if any of these bits are set, a page-fault exception is raised"
> >
> > In RISC-V VM TG, A C-bit discussion is raised. So it's a comm idea to
> > support it.
> >
>
> The C-bit was recently dropped, there is a new proposal for Page Based
> Memory Attributes (PBMT) that we can work on / push for.
C-bit still needs discussion, we shouldn't drop it directly.

>
> > Let Linux support custom PTE attributes won't get any side effect in
> > practice.
> >
> > IMO:
> > We needn't waste a bit in PTE, but the custom idea in PTE reserved
> > bits is necessary. Because Allwinner D1 needs custom PTE bits in
> > reserved bits to work around.
> > So I recommend just remove the "C" bit in PTE, but allow vendors to
> > define their own PTE attributes in reserved bits. I've found a way to
> > compact different PTE attributes of different vendors during the Linux
> > boot stage. That means we still could use One Image for all vendors in
> > Linux
>
> The spec is clear, those attributes are for standard use only, not for
> custom/platform use. It's one thing to implement custom CMOs where the
> ISA doesn't have anything for it and doesn't prevent you to do so (so
> you are not violating anything, it's just a custom extension), and we
> can hide them behind SBI calls etc, and another to violate the current
> Privilege Spec by using bits on PTEs that are reserved for standard use
> only. The intentions of the VM TG are clear, not only those bits are
> reserved but if software uses them the hw will result a page fault in
> future revisions of the spec. What's the idea here, to support
> non-compliant implementations on mainline ?
Raise a page fault won't solve anything. We still need access to the
page with proper performance.

> I'm sure you have a good
> idea on how to make this work, but as long as it violates the spec it
> can't go in IMHO.

We need PTEs to provide a non-coherency solution, and only CMO
instructions are not enough. We can't modify so many Linux drivers to
fit it.
From Linux non-coherency view, we need:
- Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors
- Non-cache + weak order to deal with framebuffer drivers
- CMO dma_sync to sync cache with DMA devices
- Userspace icache_sync solution, which prevents calls to S-mode with
IPI fence.i. (Necessary to JIT java scenarios.)

All above are not in spec, but the real chips are done.
(Actually, these have been talked about for more than five years, we
still haven't the uniform idea.)

The idea of C-bit is really important for us which prevents our chips
violates the spec.

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 03:39:40

by Anup Patel

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support



> -----Original Message-----
> From: Guo Ren <[email protected]>
> Sent: 06 June 2021 22:42
> To: Anup Patel <[email protected]>; Atish Patra <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>; [email protected];
> [email protected]; Christoph Hellwig <[email protected]>; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; Paul Walmsley
> <[email protected]>
> Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
>
> Hi Anup and Atish,
>
> On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Palmer Dabbelt <[email protected]>
> > > Sent: 03 June 2021 09:43
> > > To: [email protected]
> > > Cc: [email protected]; [email protected]; Christoph Hellwig
> > > <[email protected]>; Anup Patel <[email protected]>; [email protected];
> > > [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; linux-
> > > [email protected]; [email protected]; Paul Walmsley
> > > <[email protected]>
> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > >
> > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
> > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]>
> > > wrote:
> > > >>>
> > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > > <[email protected]> wrote:
> > > >>> >
> > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
> > > wrote:
> > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > >>> > > > Since the existing RISC-V ISA cannot solve this problem,
> > > >>> > > > it is better to provide some configuration for the SOC
> > > >>> > > > vendor to
> > > customize.
> > > >>> > >
> > > >>> > > We've been talking about this problem for close to five years.
> > > >>> > > So no, if you don't manage to get the feature into the ISA
> > > >>> > > it can't be supported.
> > > >>> >
> > > >>> > Isn't it a good goal for Linux to support the capabilities
> > > >>> > present in the SoC that a currently being fab'd?
> > > >>> >
> > > >>> > I believe the CMO group only started last year [1] so the
> > > >>> > RV64GC SoCs that are going into mass production this year
> > > >>> > would not have had the opporuntiy of utilizing any RISC-V ISA
> > > >>> > extension for handling cache management.
> > > >>>
> > > >>> The current Linux RISC-V policy is to only accept patches for
> > > >>> frozen or ratified ISA specs.
> > > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > > >>>
> > > >>> This means even if emulate CMO instructions in OpenSBI, the
> > > >>> Linux patches won't be taken by Palmer because CMO specification
> > > >>> is still in draft stage.
> > > >> Before CMO specification release, could we use a sbi_ecall to
> > > >> solve the current problem? This is not against the specification,
> > > >> when CMO is ready we could let users choose to use the new CMO in
> Linux.
> > > >>
> > > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > > >>
> > > >>>
> > > >>> Also, we all know how much time it takes for RISCV international
> > > >>> to freeze some spec. Judging by that we are looking at another
> > > >>> 3-4 years at minimum.
> > > >
> > > > Sorry for being slow here, this thread got buried.
> > > >
> > > > I've been trying to work with a handful of folks at the RISC-V
> > > > foundation to try and get a subset of the various in-development
> > > > specifications (some simple CMOs, something about non-caching in
> > > > the page tables, and some way to prevent speculative accesse from
> > > > generating coherence traffic that will break non-coherent systems).
> > > > I'm not sure we can get this together quickly, but I'd prefer to
> > > > at least try before we jump to taking vendor-specificed behavior here.
> > > > It's obviously an up-hill battle to try and get specifications
> > > > through the process and I'm certainly not going to promise it will
> > > > work, but I'm hoping that the impending need to avoid forking the
> > > > ISA will be sufficient to get people behind producing some
> > > > specifications in a timely
> > > fashion.
> > > >
> > > > I wasn't aware than this chip had non-coherent devices until I saw
> > > > this thread, so we'd been mostly focused on the Beagle V chip.
> > > > That was in a sense an easier problem because the SiFive IP in it
> > > > was never designed to have non-coherent devices so we'd have to
> > > > make anything work via a series of slow workarounds, which would
> > > > make emulating the eventually standardized behavior reasonable in
> > > > terms of performance (ie, everything would be super slow so who really
> cares).
> > > >
> > > > I don't think relying on some sort of SBI call for the CMOs whould
> > > > be such a performance hit that it would prevent these systems from
> > > > being viable, but assuming you have reasonable performance on your
> > > > non-cached accesses then that's probably not going to be viable to
> > > > trap and emulate. At that point it really just becomes silly to
> > > > pretend that we're still making things work by emulating the
> > > > eventually ratified behavior, as anyone who actually tries to use
> > > > this thing to do IO would need out of tree patches. I'm not sure
> > > > exactly what the plan is for the page table bits in the
> > > > specification right now, but if you can give me a pointer to some
> > > > documentation then I'm happy to try and push for something
> compatible.
> > > >
> > > > If we can't make the process work at the foundation then I'd be
> > > > strongly in favor of just biting the bullet and starting to take
> > > > vendor-specific code that's been implemented in hardware and is
> > > > necessarry to make things work acceptably. That's obviously a
> > > > sub-optimal solution as it'll lead to a bunch of ISA
> > > > fragmentation, but at least we'll be able to keep the software stack
> together.
> > > >
> > > > Can you tell us when these will be in the hands of users? That's
> > > > pretty important here, as I don't want to be blocking real users
> > > > from having their hardware work. IIRC there were some plans to
> > > > distribute early boards, but it looks like the foundation got
> > > > involved and I guess I lost the thread at that point.
> > > >
> > > > Sorry this is all such a headache, but hopefully we can get things
> > > > sorted out.
> > >
> > > I talked with some of the RISC-V foundation folks, we're not going
> > > to have an ISA specification for the non-coherent stuff any time
> > > soon. I took a look at this code and I definately don't want to
> > > take it as is, but I'm not opposed to taking something that makes the
> hardware work as long as it's a lot cleaner.
> > > We've already got two of these non-coherent chips, I'm sure more
> > > will come, and I'd rather have the extra headaches than make
> > > everyone fork the software stack.
> >
> > Thanks for confirming. The CMO extension is still in early stages so
> > it will certainly take more time for them. After CMO extension is
> > finalized, it will take some more time to have actual RISC-V platforms with
> CMO implemented.
> >
> > >
> > > After talking to Atish it looks like there's likely to be an SBI
> > > extension to handle the CMOs, which should let us avoid the bulk of
> > > the vendor-specific behavior in the kernel. I know some people are
> > > worried about adding to the SBI surface. I'm worried about that
> > > too, but that's way better than sticking a bunch of vendor-specific
> > > instructions into the kernel. The SBI extension should make for a
> > > straight-forward cache flush implementation in Linux, so let's just plan on
> that getting through quickly (as has been done before).
> >
> > Yes, I agree. We can have just a single SBI call which is meant for
> > DMA sync purpose only which means it will flush/invalidate pages from
> > all cache levels irrespective of the cache hierarchy (i.e.
> > flush/invalidate to RAM). The CMO extension might more generic cache
> > operations which can target any cache level.
> >
> > I am already preparing a write-up for SBI DMA sync call in SBI spec. I
> > will share it with you separately as well.
> >
> > >
> > > Unfortunately we've yet to come up with a way to handle the
> > > non-cacheable mappings without introducing a degree of
> > > vendor-specific behavior or seriously impacting performance (mark
> > > them as not valid and deal with them in the trap handler). I'm not
> > > really sure it counts as supporting the hardware if it's massively
> > > slow, so that really leaves us with vendor-specific mappings as the only
> option to make these chips work.
> >
> > A RISC-V platform can have non-cacheable mappings is following
> > possible
> > ways:
> > 1) Fixed physical address range as non-cacheable using PMAs
> > 2) Custom page table attributes
> > 3) Svpmbt extension being defined by RVI
> >
> > Atish and me both think it is possible to have RISC-V specific DMA ops
> > implementation which can handle all above case. Atish is already
> > working on DMA ops implementation for RISC-V.
> Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a
> look at:
> https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email-
> [email protected]/T/#u

The icache_sync and __vdso_icache_sync will have to be addressed
differently. The SBI DMA sync extension cannot address this.

It seems Allwinner D1 have more non-standard stuff:
1) Custom PTE bits for IO-coherent access
2) Custom data cache flush/invalidate for DMA sync
3) Custom icache flush/invalidate

Other hand, BeagleV has only two problems:
1) Custom physical address range for IO-coherent access
2) Custom L2 cache flush/invalidate for DMA sync

From above #2, can be solved by SBI DMA sync call and
Linux DMA ops for both BeagleV and Allwinner D1

On BeagleV, issue #1 can be solved using "dma-ranges".

On Allwinner D1, issues #1 and #3 need to be addressed
separately.

I think supporting BeagleV in upstream Linux is relatively
easy compared to Allwinner D1.

@Guo, please check if you can reserve dedicated
physical address range for IO-coherent access (just like
BeagleV). If yes, then we can tackle issue #1 for Allwinner
D1 using "dma-ranges" DT property.

Regards,
Anup

>
>
> >
> > >
> > > This implementation, which adds some Kconfig entries that control
> > > page table bits, definately isn't suitable for upstream. Allowing
> > > users to set arbitrary page table bits will eventually conflict with
> > > the standard, and is just going to be a mess. It'll also lead to
> > > kernels that are only compatible with specific designs, which we're
> > > trying very hard to avoid. At a bare minimum we'll need some way to
> > > detect systems with these page table bits before setting them, and
> > > some description of what the bits actually do so we can reason about
> them.
> >
> > Yes, vendor specific Kconfig options are strict NO NO. We can't
> > give-up the goal of unified kernel image for all platforms.
> >
> > Regards,
> > Anup
>
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

2021-06-07 04:26:41

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Hi Anup,

On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Guo Ren <[email protected]>
> > Sent: 06 June 2021 22:42
> > To: Anup Patel <[email protected]>; Atish Patra <[email protected]>
> > Cc: Palmer Dabbelt <[email protected]>; [email protected];
> > [email protected]; Christoph Hellwig <[email protected]>; [email protected];
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; Paul Walmsley
> > <[email protected]>
> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> >
> > Hi Anup and Atish,
> >
> > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Palmer Dabbelt <[email protected]>
> > > > Sent: 03 June 2021 09:43
> > > > To: [email protected]
> > > > Cc: [email protected]; [email protected]; Christoph Hellwig
> > > > <[email protected]>; Anup Patel <[email protected]>; [email protected];
> > > > [email protected]; [email protected]; linux-
> > > > [email protected]; [email protected]; linux-
> > > > [email protected]; [email protected]; Paul Walmsley
> > > > <[email protected]>
> > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > >
> > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected] wrote:
> > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel <[email protected]>
> > > > wrote:
> > > > >>>
> > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > > > <[email protected]> wrote:
> > > > >>> >
> > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph Hellwig
> > > > wrote:
> > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren wrote:
> > > > >>> > > > Since the existing RISC-V ISA cannot solve this problem,
> > > > >>> > > > it is better to provide some configuration for the SOC
> > > > >>> > > > vendor to
> > > > customize.
> > > > >>> > >
> > > > >>> > > We've been talking about this problem for close to five years.
> > > > >>> > > So no, if you don't manage to get the feature into the ISA
> > > > >>> > > it can't be supported.
> > > > >>> >
> > > > >>> > Isn't it a good goal for Linux to support the capabilities
> > > > >>> > present in the SoC that a currently being fab'd?
> > > > >>> >
> > > > >>> > I believe the CMO group only started last year [1] so the
> > > > >>> > RV64GC SoCs that are going into mass production this year
> > > > >>> > would not have had the opporuntiy of utilizing any RISC-V ISA
> > > > >>> > extension for handling cache management.
> > > > >>>
> > > > >>> The current Linux RISC-V policy is to only accept patches for
> > > > >>> frozen or ratified ISA specs.
> > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > > > >>>
> > > > >>> This means even if emulate CMO instructions in OpenSBI, the
> > > > >>> Linux patches won't be taken by Palmer because CMO specification
> > > > >>> is still in draft stage.
> > > > >> Before CMO specification release, could we use a sbi_ecall to
> > > > >> solve the current problem? This is not against the specification,
> > > > >> when CMO is ready we could let users choose to use the new CMO in
> > Linux.
> > > > >>
> > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > > > >>
> > > > >>>
> > > > >>> Also, we all know how much time it takes for RISCV international
> > > > >>> to freeze some spec. Judging by that we are looking at another
> > > > >>> 3-4 years at minimum.
> > > > >
> > > > > Sorry for being slow here, this thread got buried.
> > > > >
> > > > > I've been trying to work with a handful of folks at the RISC-V
> > > > > foundation to try and get a subset of the various in-development
> > > > > specifications (some simple CMOs, something about non-caching in
> > > > > the page tables, and some way to prevent speculative accesse from
> > > > > generating coherence traffic that will break non-coherent systems).
> > > > > I'm not sure we can get this together quickly, but I'd prefer to
> > > > > at least try before we jump to taking vendor-specificed behavior here.
> > > > > It's obviously an up-hill battle to try and get specifications
> > > > > through the process and I'm certainly not going to promise it will
> > > > > work, but I'm hoping that the impending need to avoid forking the
> > > > > ISA will be sufficient to get people behind producing some
> > > > > specifications in a timely
> > > > fashion.
> > > > >
> > > > > I wasn't aware than this chip had non-coherent devices until I saw
> > > > > this thread, so we'd been mostly focused on the Beagle V chip.
> > > > > That was in a sense an easier problem because the SiFive IP in it
> > > > > was never designed to have non-coherent devices so we'd have to
> > > > > make anything work via a series of slow workarounds, which would
> > > > > make emulating the eventually standardized behavior reasonable in
> > > > > terms of performance (ie, everything would be super slow so who really
> > cares).
> > > > >
> > > > > I don't think relying on some sort of SBI call for the CMOs whould
> > > > > be such a performance hit that it would prevent these systems from
> > > > > being viable, but assuming you have reasonable performance on your
> > > > > non-cached accesses then that's probably not going to be viable to
> > > > > trap and emulate. At that point it really just becomes silly to
> > > > > pretend that we're still making things work by emulating the
> > > > > eventually ratified behavior, as anyone who actually tries to use
> > > > > this thing to do IO would need out of tree patches. I'm not sure
> > > > > exactly what the plan is for the page table bits in the
> > > > > specification right now, but if you can give me a pointer to some
> > > > > documentation then I'm happy to try and push for something
> > compatible.
> > > > >
> > > > > If we can't make the process work at the foundation then I'd be
> > > > > strongly in favor of just biting the bullet and starting to take
> > > > > vendor-specific code that's been implemented in hardware and is
> > > > > necessarry to make things work acceptably. That's obviously a
> > > > > sub-optimal solution as it'll lead to a bunch of ISA
> > > > > fragmentation, but at least we'll be able to keep the software stack
> > together.
> > > > >
> > > > > Can you tell us when these will be in the hands of users? That's
> > > > > pretty important here, as I don't want to be blocking real users
> > > > > from having their hardware work. IIRC there were some plans to
> > > > > distribute early boards, but it looks like the foundation got
> > > > > involved and I guess I lost the thread at that point.
> > > > >
> > > > > Sorry this is all such a headache, but hopefully we can get things
> > > > > sorted out.
> > > >
> > > > I talked with some of the RISC-V foundation folks, we're not going
> > > > to have an ISA specification for the non-coherent stuff any time
> > > > soon. I took a look at this code and I definately don't want to
> > > > take it as is, but I'm not opposed to taking something that makes the
> > hardware work as long as it's a lot cleaner.
> > > > We've already got two of these non-coherent chips, I'm sure more
> > > > will come, and I'd rather have the extra headaches than make
> > > > everyone fork the software stack.
> > >
> > > Thanks for confirming. The CMO extension is still in early stages so
> > > it will certainly take more time for them. After CMO extension is
> > > finalized, it will take some more time to have actual RISC-V platforms with
> > CMO implemented.
> > >
> > > >
> > > > After talking to Atish it looks like there's likely to be an SBI
> > > > extension to handle the CMOs, which should let us avoid the bulk of
> > > > the vendor-specific behavior in the kernel. I know some people are
> > > > worried about adding to the SBI surface. I'm worried about that
> > > > too, but that's way better than sticking a bunch of vendor-specific
> > > > instructions into the kernel. The SBI extension should make for a
> > > > straight-forward cache flush implementation in Linux, so let's just plan on
> > that getting through quickly (as has been done before).
> > >
> > > Yes, I agree. We can have just a single SBI call which is meant for
> > > DMA sync purpose only which means it will flush/invalidate pages from
> > > all cache levels irrespective of the cache hierarchy (i.e.
> > > flush/invalidate to RAM). The CMO extension might more generic cache
> > > operations which can target any cache level.
> > >
> > > I am already preparing a write-up for SBI DMA sync call in SBI spec. I
> > > will share it with you separately as well.
> > >
> > > >
> > > > Unfortunately we've yet to come up with a way to handle the
> > > > non-cacheable mappings without introducing a degree of
> > > > vendor-specific behavior or seriously impacting performance (mark
> > > > them as not valid and deal with them in the trap handler). I'm not
> > > > really sure it counts as supporting the hardware if it's massively
> > > > slow, so that really leaves us with vendor-specific mappings as the only
> > option to make these chips work.
> > >
> > > A RISC-V platform can have non-cacheable mappings is following
> > > possible
> > > ways:
> > > 1) Fixed physical address range as non-cacheable using PMAs
> > > 2) Custom page table attributes
> > > 3) Svpmbt extension being defined by RVI
> > >
> > > Atish and me both think it is possible to have RISC-V specific DMA ops
> > > implementation which can handle all above case. Atish is already
> > > working on DMA ops implementation for RISC-V.
> > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please have a
> > look at:
> > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-email-
> > [email protected]/T/#u
>
> The icache_sync and __vdso_icache_sync will have to be addressed
> differently. The SBI DMA sync extension cannot address this.
Agree

>
> It seems Allwinner D1 have more non-standard stuff:
> 1) Custom PTE bits for IO-coherent access
> 2) Custom data cache flush/invalidate for DMA sync
> 3) Custom icache flush/invalidate
Yes, but 3) is a performance optimization, not critical for running.

>
> Other hand, BeagleV has only two problems:
> 1) Custom physical address range for IO-coherent access
> 2) Custom L2 cache flush/invalidate for DMA sync
https://github.com/starfive-tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d

Currently, they still use DMA sync with DMA descriptor, are you sure
they have minor memory physical address.

>
> From above #2, can be solved by SBI DMA sync call and
> Linux DMA ops for both BeagleV and Allwinner D1
>
> On BeagleV, issue #1 can be solved using "dma-ranges".
>
> On Allwinner D1, issues #1 and #3 need to be addressed
> separately.
>
> I think supporting BeagleV in upstream Linux is relatively
> easy compared to Allwinner D1.
>
> @Guo, please check if you can reserve dedicated
> physical address range for IO-coherent access (just like
> BeagleV). If yes, then we can tackle issue #1 for Allwinner
> D1 using "dma-ranges" DT property.
There is no dedicated physical address range for IO-coherent access in
D1. But the solution you mentioned couldn't solve all requirements.
Only one mirror physical address range is not enough, we need at least
three (Normal, DMA desc, frame buffer).
And that will triple the memory physical address which can't be
accepted by our users from the hardware design cost view.

"dma-ranges" DT property is a big early MIPS smell. ARM SOC users
can't accept it. (They just say replace the CPU, but don't touch
anything other.)

PTE attributes are the non-coherent solution for many years. MIPS also
follows that now:
ref arch/mips/include/asm/pgtable-bits.h & arch/mips/include/asm/pgtable.h

#ifndef _CACHE_CACHABLE_NO_WA
#define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_CACHABLE_WA
#define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_UNCACHED
#define _CACHE_UNCACHED (2<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_CACHABLE_NONCOHERENT
#define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_CACHABLE_CE
#define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_CACHABLE_COW
#define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_CACHABLE_CUW
#define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT)
#endif
#ifndef _CACHE_UNCACHED_ACCELERATED
#define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT)

We can't force our users to double/triplicate their physical memory regions.

>
> Regards,
> Anup
>
> >
> >
> > >
> > > >
> > > > This implementation, which adds some Kconfig entries that control
> > > > page table bits, definately isn't suitable for upstream. Allowing
> > > > users to set arbitrary page table bits will eventually conflict with
> > > > the standard, and is just going to be a mess. It'll also lead to
> > > > kernels that are only compatible with specific designs, which we're
> > > > trying very hard to avoid. At a bare minimum we'll need some way to
> > > > detect systems with these page table bits before setting them, and
> > > > some description of what the bits actually do so we can reason about
> > them.
> > >
> > > Yes, vendor specific Kconfig options are strict NO NO. We can't
> > > give-up the goal of unified kernel image for all platforms.
> > >
> > > Regards,
> > > Anup
> >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 04:49:28

by Anup Patel

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support



> -----Original Message-----
> From: Guo Ren <[email protected]>
> Sent: 07 June 2021 09:52
> To: Anup Patel <[email protected]>
> Cc: Atish Patra <[email protected]>; Palmer Dabbelt
> <[email protected]>; [email protected]; [email protected];
> Christoph Hellwig <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> Paul Walmsley <[email protected]>
> Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
>
> Hi Anup,
>
> On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <[email protected]> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Guo Ren <[email protected]>
> > > Sent: 06 June 2021 22:42
> > > To: Anup Patel <[email protected]>; Atish Patra
> > > <[email protected]>
> > > Cc: Palmer Dabbelt <[email protected]>; [email protected];
> > > [email protected]; Christoph Hellwig <[email protected]>;
> > > [email protected]; [email protected];
> > > [email protected]; linux- [email protected];
> > > [email protected]; linux- [email protected];
> > > [email protected]; Paul Walmsley <[email protected]>
> > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > >
> > > Hi Anup and Atish,
> > >
> > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]>
> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Palmer Dabbelt <[email protected]>
> > > > > Sent: 03 June 2021 09:43
> > > > > To: [email protected]
> > > > > Cc: [email protected]; [email protected]; Christoph Hellwig
> > > > > <[email protected]>; Anup Patel <[email protected]>;
> [email protected];
> > > > > [email protected]; [email protected]; linux-
> > > > > [email protected]; [email protected]; linux-
> > > > > [email protected]; [email protected]; Paul Walmsley
> > > > > <[email protected]>
> > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > > >
> > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected]
> wrote:
> > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel
> > > > > >> <[email protected]>
> > > > > wrote:
> > > > > >>>
> > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > > > > <[email protected]> wrote:
> > > > > >>> >
> > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph
> > > > > >>> > Hellwig
> > > > > wrote:
> > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren
> wrote:
> > > > > >>> > > > Since the existing RISC-V ISA cannot solve this
> > > > > >>> > > > problem, it is better to provide some configuration
> > > > > >>> > > > for the SOC vendor to
> > > > > customize.
> > > > > >>> > >
> > > > > >>> > > We've been talking about this problem for close to five years.
> > > > > >>> > > So no, if you don't manage to get the feature into the
> > > > > >>> > > ISA it can't be supported.
> > > > > >>> >
> > > > > >>> > Isn't it a good goal for Linux to support the capabilities
> > > > > >>> > present in the SoC that a currently being fab'd?
> > > > > >>> >
> > > > > >>> > I believe the CMO group only started last year [1] so the
> > > > > >>> > RV64GC SoCs that are going into mass production this year
> > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V
> > > > > >>> > ISA extension for handling cache management.
> > > > > >>>
> > > > > >>> The current Linux RISC-V policy is to only accept patches
> > > > > >>> for frozen or ratified ISA specs.
> > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > > > > >>>
> > > > > >>> This means even if emulate CMO instructions in OpenSBI, the
> > > > > >>> Linux patches won't be taken by Palmer because CMO
> > > > > >>> specification is still in draft stage.
> > > > > >> Before CMO specification release, could we use a sbi_ecall to
> > > > > >> solve the current problem? This is not against the
> > > > > >> specification, when CMO is ready we could let users choose to
> > > > > >> use the new CMO in
> > > Linux.
> > > > > >>
> > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > > > > >>
> > > > > >>>
> > > > > >>> Also, we all know how much time it takes for RISCV
> > > > > >>> international to freeze some spec. Judging by that we are
> > > > > >>> looking at another
> > > > > >>> 3-4 years at minimum.
> > > > > >
> > > > > > Sorry for being slow here, this thread got buried.
> > > > > >
> > > > > > I've been trying to work with a handful of folks at the RISC-V
> > > > > > foundation to try and get a subset of the various
> > > > > > in-development specifications (some simple CMOs, something
> > > > > > about non-caching in the page tables, and some way to prevent
> > > > > > speculative accesse from generating coherence traffic that will break
> non-coherent systems).
> > > > > > I'm not sure we can get this together quickly, but I'd prefer
> > > > > > to at least try before we jump to taking vendor-specificed behavior
> here.
> > > > > > It's obviously an up-hill battle to try and get specifications
> > > > > > through the process and I'm certainly not going to promise it
> > > > > > will work, but I'm hoping that the impending need to avoid
> > > > > > forking the ISA will be sufficient to get people behind
> > > > > > producing some specifications in a timely
> > > > > fashion.
> > > > > >
> > > > > > I wasn't aware than this chip had non-coherent devices until I
> > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip.
> > > > > > That was in a sense an easier problem because the SiFive IP in
> > > > > > it was never designed to have non-coherent devices so we'd
> > > > > > have to make anything work via a series of slow workarounds,
> > > > > > which would make emulating the eventually standardized
> > > > > > behavior reasonable in terms of performance (ie, everything
> > > > > > would be super slow so who really
> > > cares).
> > > > > >
> > > > > > I don't think relying on some sort of SBI call for the CMOs
> > > > > > whould be such a performance hit that it would prevent these
> > > > > > systems from being viable, but assuming you have reasonable
> > > > > > performance on your non-cached accesses then that's probably
> > > > > > not going to be viable to trap and emulate. At that point it
> > > > > > really just becomes silly to pretend that we're still making
> > > > > > things work by emulating the eventually ratified behavior, as
> > > > > > anyone who actually tries to use this thing to do IO would
> > > > > > need out of tree patches. I'm not sure exactly what the plan
> > > > > > is for the page table bits in the specification right now, but
> > > > > > if you can give me a pointer to some documentation then I'm
> > > > > > happy to try and push for something
> > > compatible.
> > > > > >
> > > > > > If we can't make the process work at the foundation then I'd
> > > > > > be strongly in favor of just biting the bullet and starting to
> > > > > > take vendor-specific code that's been implemented in hardware
> > > > > > and is necessarry to make things work acceptably. That's
> > > > > > obviously a sub-optimal solution as it'll lead to a bunch of
> > > > > > ISA fragmentation, but at least we'll be able to keep the
> > > > > > software stack
> > > together.
> > > > > >
> > > > > > Can you tell us when these will be in the hands of users?
> > > > > > That's pretty important here, as I don't want to be blocking
> > > > > > real users from having their hardware work. IIRC there were
> > > > > > some plans to distribute early boards, but it looks like the
> > > > > > foundation got involved and I guess I lost the thread at that point.
> > > > > >
> > > > > > Sorry this is all such a headache, but hopefully we can get
> > > > > > things sorted out.
> > > > >
> > > > > I talked with some of the RISC-V foundation folks, we're not
> > > > > going to have an ISA specification for the non-coherent stuff
> > > > > any time soon. I took a look at this code and I definately
> > > > > don't want to take it as is, but I'm not opposed to taking
> > > > > something that makes the
> > > hardware work as long as it's a lot cleaner.
> > > > > We've already got two of these non-coherent chips, I'm sure more
> > > > > will come, and I'd rather have the extra headaches than make
> > > > > everyone fork the software stack.
> > > >
> > > > Thanks for confirming. The CMO extension is still in early stages
> > > > so it will certainly take more time for them. After CMO extension
> > > > is finalized, it will take some more time to have actual RISC-V
> > > > platforms with
> > > CMO implemented.
> > > >
> > > > >
> > > > > After talking to Atish it looks like there's likely to be an SBI
> > > > > extension to handle the CMOs, which should let us avoid the bulk
> > > > > of the vendor-specific behavior in the kernel. I know some
> > > > > people are worried about adding to the SBI surface. I'm worried
> > > > > about that too, but that's way better than sticking a bunch of
> > > > > vendor-specific instructions into the kernel. The SBI extension
> > > > > should make for a straight-forward cache flush implementation in
> > > > > Linux, so let's just plan on
> > > that getting through quickly (as has been done before).
> > > >
> > > > Yes, I agree. We can have just a single SBI call which is meant
> > > > for DMA sync purpose only which means it will flush/invalidate
> > > > pages from all cache levels irrespective of the cache hierarchy (i.e.
> > > > flush/invalidate to RAM). The CMO extension might more generic
> > > > cache operations which can target any cache level.
> > > >
> > > > I am already preparing a write-up for SBI DMA sync call in SBI
> > > > spec. I will share it with you separately as well.
> > > >
> > > > >
> > > > > Unfortunately we've yet to come up with a way to handle the
> > > > > non-cacheable mappings without introducing a degree of
> > > > > vendor-specific behavior or seriously impacting performance
> > > > > (mark them as not valid and deal with them in the trap handler).
> > > > > I'm not really sure it counts as supporting the hardware if it's
> > > > > massively slow, so that really leaves us with vendor-specific
> > > > > mappings as the only
> > > option to make these chips work.
> > > >
> > > > A RISC-V platform can have non-cacheable mappings is following
> > > > possible
> > > > ways:
> > > > 1) Fixed physical address range as non-cacheable using PMAs
> > > > 2) Custom page table attributes
> > > > 3) Svpmbt extension being defined by RVI
> > > >
> > > > Atish and me both think it is possible to have RISC-V specific DMA
> > > > ops implementation which can handle all above case. Atish is
> > > > already working on DMA ops implementation for RISC-V.
> > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please
> > > have a look at:
> > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema
> > > il-
> > > [email protected]/T/#u
> >
> > The icache_sync and __vdso_icache_sync will have to be addressed
> > differently. The SBI DMA sync extension cannot address this.
> Agree
>
> >
> > It seems Allwinner D1 have more non-standard stuff:
> > 1) Custom PTE bits for IO-coherent access
> > 2) Custom data cache flush/invalidate for DMA sync
> > 3) Custom icache flush/invalidate
> Yes, but 3) is a performance optimization, not critical for running.
>
> >
> > Other hand, BeagleV has only two problems:
> > 1) Custom physical address range for IO-coherent access
> > 2) Custom L2 cache flush/invalidate for DMA sync
> https://github.com/starfive-
> tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d
>
> Currently, they still use DMA sync with DMA descriptor, are you sure they
> have minor memory physical address.
>
> >
> > From above #2, can be solved by SBI DMA sync call and Linux DMA ops
> > for both BeagleV and Allwinner D1
> >
> > On BeagleV, issue #1 can be solved using "dma-ranges".
> >
> > On Allwinner D1, issues #1 and #3 need to be addressed separately.
> >
> > I think supporting BeagleV in upstream Linux is relatively easy
> > compared to Allwinner D1.
> >
> > @Guo, please check if you can reserve dedicated physical address range
> > for IO-coherent access (just like BeagleV). If yes, then we can tackle
> > issue #1 for Allwinner
> > D1 using "dma-ranges" DT property.
> There is no dedicated physical address range for IO-coherent access in D1. But
> the solution you mentioned couldn't solve all requirements.
> Only one mirror physical address range is not enough, we need at least three
> (Normal, DMA desc, frame buffer).

How many non-coherent devices you really have?

I am guess lot of critical devices on Allwinner D1 are not coherent with CPU.
The problem for Allwinner D1 is even worst than I thought. If such critical
high through-put devices are not cache coherent with CPU then I am
speechless about Allwinner D1 situation.

> And that will triple the memory physical address which can't be accepted by
> our users from the hardware design cost view.
>
> "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't
> accept it. (They just say replace the CPU, but don't touch anything other.)
>
> PTE attributes are the non-coherent solution for many years. MIPS also
> follows that now:
> ref arch/mips/include/asm/pgtable-bits.h &
> arch/mips/include/asm/pgtable.h

RISC-V is in the process of standardizing Svpmbt extension.

Unfortunately, the higher order bits which your implementation uses is
not for SoC vendor use as-per the RISC-V privilege spec.

>
> #ifndef _CACHE_CACHABLE_NO_WA
> #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_CACHABLE_WA
> #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_UNCACHED
> #define _CACHE_UNCACHED (2<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_CACHABLE_NONCOHERENT
> #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_CACHABLE_CE
> #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_CACHABLE_COW
> #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_CACHABLE_CUW
> #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT)
> #endif
> #ifndef _CACHE_UNCACHED_ACCELERATED
> #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT)
>
> We can't force our users to double/triplicate their physical memory regions.

We are trying to find a workable solution here so that we don't have
to deal with custom PTE attributes which are reserved for RISC-V priv
specification only.

Regards,
Anup

>
> >
> > Regards,
> > Anup
> >
> > >
> > >
> > > >
> > > > >
> > > > > This implementation, which adds some Kconfig entries that
> > > > > control page table bits, definately isn't suitable for upstream.
> > > > > Allowing users to set arbitrary page table bits will eventually
> > > > > conflict with the standard, and is just going to be a mess.
> > > > > It'll also lead to kernels that are only compatible with
> > > > > specific designs, which we're trying very hard to avoid. At a
> > > > > bare minimum we'll need some way to detect systems with these
> > > > > page table bits before setting them, and some description of
> > > > > what the bits actually do so we can reason about
> > > them.
> > > >
> > > > Yes, vendor specific Kconfig options are strict NO NO. We can't
> > > > give-up the goal of unified kernel image for all platforms.
> > > >
> > > > Regards,
> > > > Anup
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Guo Ren
> > >
> > > ML: https://lore.kernel.org/linux-csky/
>
>
>
> --
> Best Regards
> Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

2021-06-07 05:13:28

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 12:47 PM Anup Patel <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Guo Ren <[email protected]>
> > Sent: 07 June 2021 09:52
> > To: Anup Patel <[email protected]>
> > Cc: Atish Patra <[email protected]>; Palmer Dabbelt
> > <[email protected]>; [email protected]; [email protected];
> > Christoph Hellwig <[email protected]>; [email protected]; [email protected];
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected];
> > Paul Walmsley <[email protected]>
> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> >
> > Hi Anup,
> >
> > On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <[email protected]> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Guo Ren <[email protected]>
> > > > Sent: 06 June 2021 22:42
> > > > To: Anup Patel <[email protected]>; Atish Patra
> > > > <[email protected]>
> > > > Cc: Palmer Dabbelt <[email protected]>; [email protected];
> > > > [email protected]; Christoph Hellwig <[email protected]>;
> > > > [email protected]; [email protected];
> > > > [email protected]; linux- [email protected];
> > > > [email protected]; linux- [email protected];
> > > > [email protected]; Paul Walmsley <[email protected]>
> > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > >
> > > > Hi Anup and Atish,
> > > >
> > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]>
> > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Palmer Dabbelt <[email protected]>
> > > > > > Sent: 03 June 2021 09:43
> > > > > > To: [email protected]
> > > > > > Cc: [email protected]; [email protected]; Christoph Hellwig
> > > > > > <[email protected]>; Anup Patel <[email protected]>;
> > [email protected];
> > > > > > [email protected]; [email protected]; linux-
> > > > > > [email protected]; [email protected]; linux-
> > > > > > [email protected]; [email protected]; Paul Walmsley
> > > > > > <[email protected]>
> > > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > > > >
> > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected]
> > wrote:
> > > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel
> > > > > > >> <[email protected]>
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > > > > > <[email protected]> wrote:
> > > > > > >>> >
> > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph
> > > > > > >>> > Hellwig
> > > > > > wrote:
> > > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren
> > wrote:
> > > > > > >>> > > > Since the existing RISC-V ISA cannot solve this
> > > > > > >>> > > > problem, it is better to provide some configuration
> > > > > > >>> > > > for the SOC vendor to
> > > > > > customize.
> > > > > > >>> > >
> > > > > > >>> > > We've been talking about this problem for close to five years.
> > > > > > >>> > > So no, if you don't manage to get the feature into the
> > > > > > >>> > > ISA it can't be supported.
> > > > > > >>> >
> > > > > > >>> > Isn't it a good goal for Linux to support the capabilities
> > > > > > >>> > present in the SoC that a currently being fab'd?
> > > > > > >>> >
> > > > > > >>> > I believe the CMO group only started last year [1] so the
> > > > > > >>> > RV64GC SoCs that are going into mass production this year
> > > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V
> > > > > > >>> > ISA extension for handling cache management.
> > > > > > >>>
> > > > > > >>> The current Linux RISC-V policy is to only accept patches
> > > > > > >>> for frozen or ratified ISA specs.
> > > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > > > > > >>>
> > > > > > >>> This means even if emulate CMO instructions in OpenSBI, the
> > > > > > >>> Linux patches won't be taken by Palmer because CMO
> > > > > > >>> specification is still in draft stage.
> > > > > > >> Before CMO specification release, could we use a sbi_ecall to
> > > > > > >> solve the current problem? This is not against the
> > > > > > >> specification, when CMO is ready we could let users choose to
> > > > > > >> use the new CMO in
> > > > Linux.
> > > > > > >>
> > > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > > > > > >>
> > > > > > >>>
> > > > > > >>> Also, we all know how much time it takes for RISCV
> > > > > > >>> international to freeze some spec. Judging by that we are
> > > > > > >>> looking at another
> > > > > > >>> 3-4 years at minimum.
> > > > > > >
> > > > > > > Sorry for being slow here, this thread got buried.
> > > > > > >
> > > > > > > I've been trying to work with a handful of folks at the RISC-V
> > > > > > > foundation to try and get a subset of the various
> > > > > > > in-development specifications (some simple CMOs, something
> > > > > > > about non-caching in the page tables, and some way to prevent
> > > > > > > speculative accesse from generating coherence traffic that will break
> > non-coherent systems).
> > > > > > > I'm not sure we can get this together quickly, but I'd prefer
> > > > > > > to at least try before we jump to taking vendor-specificed behavior
> > here.
> > > > > > > It's obviously an up-hill battle to try and get specifications
> > > > > > > through the process and I'm certainly not going to promise it
> > > > > > > will work, but I'm hoping that the impending need to avoid
> > > > > > > forking the ISA will be sufficient to get people behind
> > > > > > > producing some specifications in a timely
> > > > > > fashion.
> > > > > > >
> > > > > > > I wasn't aware than this chip had non-coherent devices until I
> > > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip.
> > > > > > > That was in a sense an easier problem because the SiFive IP in
> > > > > > > it was never designed to have non-coherent devices so we'd
> > > > > > > have to make anything work via a series of slow workarounds,
> > > > > > > which would make emulating the eventually standardized
> > > > > > > behavior reasonable in terms of performance (ie, everything
> > > > > > > would be super slow so who really
> > > > cares).
> > > > > > >
> > > > > > > I don't think relying on some sort of SBI call for the CMOs
> > > > > > > whould be such a performance hit that it would prevent these
> > > > > > > systems from being viable, but assuming you have reasonable
> > > > > > > performance on your non-cached accesses then that's probably
> > > > > > > not going to be viable to trap and emulate. At that point it
> > > > > > > really just becomes silly to pretend that we're still making
> > > > > > > things work by emulating the eventually ratified behavior, as
> > > > > > > anyone who actually tries to use this thing to do IO would
> > > > > > > need out of tree patches. I'm not sure exactly what the plan
> > > > > > > is for the page table bits in the specification right now, but
> > > > > > > if you can give me a pointer to some documentation then I'm
> > > > > > > happy to try and push for something
> > > > compatible.
> > > > > > >
> > > > > > > If we can't make the process work at the foundation then I'd
> > > > > > > be strongly in favor of just biting the bullet and starting to
> > > > > > > take vendor-specific code that's been implemented in hardware
> > > > > > > and is necessarry to make things work acceptably. That's
> > > > > > > obviously a sub-optimal solution as it'll lead to a bunch of
> > > > > > > ISA fragmentation, but at least we'll be able to keep the
> > > > > > > software stack
> > > > together.
> > > > > > >
> > > > > > > Can you tell us when these will be in the hands of users?
> > > > > > > That's pretty important here, as I don't want to be blocking
> > > > > > > real users from having their hardware work. IIRC there were
> > > > > > > some plans to distribute early boards, but it looks like the
> > > > > > > foundation got involved and I guess I lost the thread at that point.
> > > > > > >
> > > > > > > Sorry this is all such a headache, but hopefully we can get
> > > > > > > things sorted out.
> > > > > >
> > > > > > I talked with some of the RISC-V foundation folks, we're not
> > > > > > going to have an ISA specification for the non-coherent stuff
> > > > > > any time soon. I took a look at this code and I definately
> > > > > > don't want to take it as is, but I'm not opposed to taking
> > > > > > something that makes the
> > > > hardware work as long as it's a lot cleaner.
> > > > > > We've already got two of these non-coherent chips, I'm sure more
> > > > > > will come, and I'd rather have the extra headaches than make
> > > > > > everyone fork the software stack.
> > > > >
> > > > > Thanks for confirming. The CMO extension is still in early stages
> > > > > so it will certainly take more time for them. After CMO extension
> > > > > is finalized, it will take some more time to have actual RISC-V
> > > > > platforms with
> > > > CMO implemented.
> > > > >
> > > > > >
> > > > > > After talking to Atish it looks like there's likely to be an SBI
> > > > > > extension to handle the CMOs, which should let us avoid the bulk
> > > > > > of the vendor-specific behavior in the kernel. I know some
> > > > > > people are worried about adding to the SBI surface. I'm worried
> > > > > > about that too, but that's way better than sticking a bunch of
> > > > > > vendor-specific instructions into the kernel. The SBI extension
> > > > > > should make for a straight-forward cache flush implementation in
> > > > > > Linux, so let's just plan on
> > > > that getting through quickly (as has been done before).
> > > > >
> > > > > Yes, I agree. We can have just a single SBI call which is meant
> > > > > for DMA sync purpose only which means it will flush/invalidate
> > > > > pages from all cache levels irrespective of the cache hierarchy (i.e.
> > > > > flush/invalidate to RAM). The CMO extension might more generic
> > > > > cache operations which can target any cache level.
> > > > >
> > > > > I am already preparing a write-up for SBI DMA sync call in SBI
> > > > > spec. I will share it with you separately as well.
> > > > >
> > > > > >
> > > > > > Unfortunately we've yet to come up with a way to handle the
> > > > > > non-cacheable mappings without introducing a degree of
> > > > > > vendor-specific behavior or seriously impacting performance
> > > > > > (mark them as not valid and deal with them in the trap handler).
> > > > > > I'm not really sure it counts as supporting the hardware if it's
> > > > > > massively slow, so that really leaves us with vendor-specific
> > > > > > mappings as the only
> > > > option to make these chips work.
> > > > >
> > > > > A RISC-V platform can have non-cacheable mappings is following
> > > > > possible
> > > > > ways:
> > > > > 1) Fixed physical address range as non-cacheable using PMAs
> > > > > 2) Custom page table attributes
> > > > > 3) Svpmbt extension being defined by RVI
> > > > >
> > > > > Atish and me both think it is possible to have RISC-V specific DMA
> > > > > ops implementation which can handle all above case. Atish is
> > > > > already working on DMA ops implementation for RISC-V.
> > > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please
> > > > have a look at:
> > > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema
> > > > il-
> > > > [email protected]/T/#u
> > >
> > > The icache_sync and __vdso_icache_sync will have to be addressed
> > > differently. The SBI DMA sync extension cannot address this.
> > Agree
> >
> > >
> > > It seems Allwinner D1 have more non-standard stuff:
> > > 1) Custom PTE bits for IO-coherent access
> > > 2) Custom data cache flush/invalidate for DMA sync
> > > 3) Custom icache flush/invalidate
> > Yes, but 3) is a performance optimization, not critical for running.
> >
> > >
> > > Other hand, BeagleV has only two problems:
> > > 1) Custom physical address range for IO-coherent access
> > > 2) Custom L2 cache flush/invalidate for DMA sync
> > https://github.com/starfive-
> > tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d
> >
> > Currently, they still use DMA sync with DMA descriptor, are you sure they
> > have minor memory physical address.
> >
> > >
> > > From above #2, can be solved by SBI DMA sync call and Linux DMA ops
> > > for both BeagleV and Allwinner D1
> > >
> > > On BeagleV, issue #1 can be solved using "dma-ranges".
> > >
> > > On Allwinner D1, issues #1 and #3 need to be addressed separately.
> > >
> > > I think supporting BeagleV in upstream Linux is relatively easy
> > > compared to Allwinner D1.
> > >
> > > @Guo, please check if you can reserve dedicated physical address range
> > > for IO-coherent access (just like BeagleV). If yes, then we can tackle
> > > issue #1 for Allwinner
> > > D1 using "dma-ranges" DT property.
> > There is no dedicated physical address range for IO-coherent access in D1. But
> > the solution you mentioned couldn't solve all requirements.
> > Only one mirror physical address range is not enough, we need at least three
> > (Normal, DMA desc, frame buffer).
>
> How many non-coherent devices you really have?
>
> I am guess lot of critical devices on Allwinner D1 are not coherent with CPU.
> The problem for Allwinner D1 is even worst than I thought. If such critical
> high through-put devices are not cache coherent with CPU then I am
> speechless about Allwinner D1 situation.
Allwinner D1 is a cost-down product and there is no cache coherent
device at all. Cache coherent interconnect will increase the chip
design cost and the performance is enough in their scenario.

So that why we need Srong Order + noncache & Weak Order + noncache to
optimization.

From T-HEAD side we could privide two kinds of solution of DMA coherent.
- Let SOC vendor update coherent interconnect, and our CPU could
support coherent protocal.
- Let SOC vendor connect their DMA device with our CPU LL cache
coherent interface.

But we can't force them do that. They want how my origin soc works
then make it work with your RV core. They know trade off coherency or
non-coherency in their busisness scenario.

>
> > And that will triple the memory physical address which can't be accepted by
> > our users from the hardware design cost view.
> >
> > "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't
> > accept it. (They just say replace the CPU, but don't touch anything other.)
> >
> > PTE attributes are the non-coherent solution for many years. MIPS also
> > follows that now:
> > ref arch/mips/include/asm/pgtable-bits.h &
> > arch/mips/include/asm/pgtable.h
>
> RISC-V is in the process of standardizing Svpmbt extension.
>
> Unfortunately, the higher order bits which your implementation uses is
> not for SoC vendor use as-per the RISC-V privilege spec.
For a while, I had placed my hopes on C-bit, but my fantasy was
disillusioned. -_-!

>
> >
> > #ifndef _CACHE_CACHABLE_NO_WA
> > #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_WA
> > #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_UNCACHED
> > #define _CACHE_UNCACHED (2<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_NONCOHERENT
> > #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_CE
> > #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_COW
> > #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_CUW
> > #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_UNCACHED_ACCELERATED
> > #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT)
> >
> > We can't force our users to double/triplicate their physical memory regions.
>
> We are trying to find a workable solution here so that we don't have
> to deal with custom PTE attributes which are reserved for RISC-V priv
> specification only.
Thank you for your hard work in this regard, sincerely.

>
> Regards,
> Anup
>
> >
> > >
> > > Regards,
> > > Anup
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > This implementation, which adds some Kconfig entries that
> > > > > > control page table bits, definately isn't suitable for upstream.
> > > > > > Allowing users to set arbitrary page table bits will eventually
> > > > > > conflict with the standard, and is just going to be a mess.
> > > > > > It'll also lead to kernels that are only compatible with
> > > > > > specific designs, which we're trying very hard to avoid. At a
> > > > > > bare minimum we'll need some way to detect systems with these
> > > > > > page table bits before setting them, and some description of
> > > > > > what the bits actually do so we can reason about
> > > > them.
> > > > >
> > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't
> > > > > give-up the goal of unified kernel image for all platforms.
> > > > >
> > > > > Regards,
> > > > > Anup
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > > Guo Ren
> > > >
> > > > ML: https://lore.kernel.org/linux-csky/
> >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 05:18:29

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

:


On Mon, Jun 7, 2021 at 12:47 PM Anup Patel <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Guo Ren <[email protected]>
> > Sent: 07 June 2021 09:52
> > To: Anup Patel <[email protected]>
> > Cc: Atish Patra <[email protected]>; Palmer Dabbelt
> > <[email protected]>; [email protected]; [email protected];
> > Christoph Hellwig <[email protected]>; [email protected]; [email protected];
> > [email protected]; [email protected]; linux-
> > [email protected]; [email protected]; [email protected];
> > Paul Walmsley <[email protected]>
> > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> >
> > Hi Anup,
> >
> > On Mon, Jun 7, 2021 at 11:38 AM Anup Patel <[email protected]> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Guo Ren <[email protected]>
> > > > Sent: 06 June 2021 22:42
> > > > To: Anup Patel <[email protected]>; Atish Patra
> > > > <[email protected]>
> > > > Cc: Palmer Dabbelt <[email protected]>; [email protected];
> > > > [email protected]; Christoph Hellwig <[email protected]>;
> > > > [email protected]; [email protected];
> > > > [email protected]; linux- [email protected];
> > > > [email protected]; linux- [email protected];
> > > > [email protected]; Paul Walmsley <[email protected]>
> > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > >
> > > > Hi Anup and Atish,
> > > >
> > > > On Thu, Jun 3, 2021 at 2:00 PM Anup Patel <[email protected]>
> > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Palmer Dabbelt <[email protected]>
> > > > > > Sent: 03 June 2021 09:43
> > > > > > To: [email protected]
> > > > > > Cc: [email protected]; [email protected]; Christoph Hellwig
> > > > > > <[email protected]>; Anup Patel <[email protected]>;
> > [email protected];
> > > > > > [email protected]; [email protected]; linux-
> > > > > > [email protected]; [email protected]; linux-
> > > > > > [email protected]; [email protected]; Paul Walmsley
> > > > > > <[email protected]>
> > > > > > Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support
> > > > > >
> > > > > > On Sat, 29 May 2021 17:30:18 PDT (-0700), Palmer Dabbelt wrote:
> > > > > > > On Fri, 21 May 2021 17:36:08 PDT (-0700), [email protected]
> > wrote:
> > > > > > >> On Wed, May 19, 2021 at 3:15 PM Anup Patel
> > > > > > >> <[email protected]>
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> On Wed, May 19, 2021 at 12:24 PM Drew Fustini
> > > > > > <[email protected]> wrote:
> > > > > > >>> >
> > > > > > >>> > On Wed, May 19, 2021 at 08:06:17AM +0200, Christoph
> > > > > > >>> > Hellwig
> > > > > > wrote:
> > > > > > >>> > > On Wed, May 19, 2021 at 02:05:00PM +0800, Guo Ren
> > wrote:
> > > > > > >>> > > > Since the existing RISC-V ISA cannot solve this
> > > > > > >>> > > > problem, it is better to provide some configuration
> > > > > > >>> > > > for the SOC vendor to
> > > > > > customize.
> > > > > > >>> > >
> > > > > > >>> > > We've been talking about this problem for close to five years.
> > > > > > >>> > > So no, if you don't manage to get the feature into the
> > > > > > >>> > > ISA it can't be supported.
> > > > > > >>> >
> > > > > > >>> > Isn't it a good goal for Linux to support the capabilities
> > > > > > >>> > present in the SoC that a currently being fab'd?
> > > > > > >>> >
> > > > > > >>> > I believe the CMO group only started last year [1] so the
> > > > > > >>> > RV64GC SoCs that are going into mass production this year
> > > > > > >>> > would not have had the opporuntiy of utilizing any RISC-V
> > > > > > >>> > ISA extension for handling cache management.
> > > > > > >>>
> > > > > > >>> The current Linux RISC-V policy is to only accept patches
> > > > > > >>> for frozen or ratified ISA specs.
> > > > > > >>> (Refer, Documentation/riscv/patch-acceptance.rst)
> > > > > > >>>
> > > > > > >>> This means even if emulate CMO instructions in OpenSBI, the
> > > > > > >>> Linux patches won't be taken by Palmer because CMO
> > > > > > >>> specification is still in draft stage.
> > > > > > >> Before CMO specification release, could we use a sbi_ecall to
> > > > > > >> solve the current problem? This is not against the
> > > > > > >> specification, when CMO is ready we could let users choose to
> > > > > > >> use the new CMO in
> > > > Linux.
> > > > > > >>
> > > > > > >> From a tech view, CMO trap emulation is the same as sbi_ecall.
> > > > > > >>
> > > > > > >>>
> > > > > > >>> Also, we all know how much time it takes for RISCV
> > > > > > >>> international to freeze some spec. Judging by that we are
> > > > > > >>> looking at another
> > > > > > >>> 3-4 years at minimum.
> > > > > > >
> > > > > > > Sorry for being slow here, this thread got buried.
> > > > > > >
> > > > > > > I've been trying to work with a handful of folks at the RISC-V
> > > > > > > foundation to try and get a subset of the various
> > > > > > > in-development specifications (some simple CMOs, something
> > > > > > > about non-caching in the page tables, and some way to prevent
> > > > > > > speculative accesse from generating coherence traffic that will break
> > non-coherent systems).
> > > > > > > I'm not sure we can get this together quickly, but I'd prefer
> > > > > > > to at least try before we jump to taking vendor-specificed behavior
> > here.
> > > > > > > It's obviously an up-hill battle to try and get specifications
> > > > > > > through the process and I'm certainly not going to promise it
> > > > > > > will work, but I'm hoping that the impending need to avoid
> > > > > > > forking the ISA will be sufficient to get people behind
> > > > > > > producing some specifications in a timely
> > > > > > fashion.
> > > > > > >
> > > > > > > I wasn't aware than this chip had non-coherent devices until I
> > > > > > > saw this thread, so we'd been mostly focused on the Beagle V chip.
> > > > > > > That was in a sense an easier problem because the SiFive IP in
> > > > > > > it was never designed to have non-coherent devices so we'd
> > > > > > > have to make anything work via a series of slow workarounds,
> > > > > > > which would make emulating the eventually standardized
> > > > > > > behavior reasonable in terms of performance (ie, everything
> > > > > > > would be super slow so who really
> > > > cares).
> > > > > > >
> > > > > > > I don't think relying on some sort of SBI call for the CMOs
> > > > > > > whould be such a performance hit that it would prevent these
> > > > > > > systems from being viable, but assuming you have reasonable
> > > > > > > performance on your non-cached accesses then that's probably
> > > > > > > not going to be viable to trap and emulate. At that point it
> > > > > > > really just becomes silly to pretend that we're still making
> > > > > > > things work by emulating the eventually ratified behavior, as
> > > > > > > anyone who actually tries to use this thing to do IO would
> > > > > > > need out of tree patches. I'm not sure exactly what the plan
> > > > > > > is for the page table bits in the specification right now, but
> > > > > > > if you can give me a pointer to some documentation then I'm
> > > > > > > happy to try and push for something
> > > > compatible.
> > > > > > >
> > > > > > > If we can't make the process work at the foundation then I'd
> > > > > > > be strongly in favor of just biting the bullet and starting to
> > > > > > > take vendor-specific code that's been implemented in hardware
> > > > > > > and is necessarry to make things work acceptably. That's
> > > > > > > obviously a sub-optimal solution as it'll lead to a bunch of
> > > > > > > ISA fragmentation, but at least we'll be able to keep the
> > > > > > > software stack
> > > > together.
> > > > > > >
> > > > > > > Can you tell us when these will be in the hands of users?
> > > > > > > That's pretty important here, as I don't want to be blocking
> > > > > > > real users from having their hardware work. IIRC there were
> > > > > > > some plans to distribute early boards, but it looks like the
> > > > > > > foundation got involved and I guess I lost the thread at that point.
> > > > > > >
> > > > > > > Sorry this is all such a headache, but hopefully we can get
> > > > > > > things sorted out.
> > > > > >
> > > > > > I talked with some of the RISC-V foundation folks, we're not
> > > > > > going to have an ISA specification for the non-coherent stuff
> > > > > > any time soon. I took a look at this code and I definately
> > > > > > don't want to take it as is, but I'm not opposed to taking
> > > > > > something that makes the
> > > > hardware work as long as it's a lot cleaner.
> > > > > > We've already got two of these non-coherent chips, I'm sure more
> > > > > > will come, and I'd rather have the extra headaches than make
> > > > > > everyone fork the software stack.
> > > > >
> > > > > Thanks for confirming. The CMO extension is still in early stages
> > > > > so it will certainly take more time for them. After CMO extension
> > > > > is finalized, it will take some more time to have actual RISC-V
> > > > > platforms with
> > > > CMO implemented.
> > > > >
> > > > > >
> > > > > > After talking to Atish it looks like there's likely to be an SBI
> > > > > > extension to handle the CMOs, which should let us avoid the bulk
> > > > > > of the vendor-specific behavior in the kernel. I know some
> > > > > > people are worried about adding to the SBI surface. I'm worried
> > > > > > about that too, but that's way better than sticking a bunch of
> > > > > > vendor-specific instructions into the kernel. The SBI extension
> > > > > > should make for a straight-forward cache flush implementation in
> > > > > > Linux, so let's just plan on
> > > > that getting through quickly (as has been done before).
> > > > >
> > > > > Yes, I agree. We can have just a single SBI call which is meant
> > > > > for DMA sync purpose only which means it will flush/invalidate
> > > > > pages from all cache levels irrespective of the cache hierarchy (i.e.
> > > > > flush/invalidate to RAM). The CMO extension might more generic
> > > > > cache operations which can target any cache level.
> > > > >
> > > > > I am already preparing a write-up for SBI DMA sync call in SBI
> > > > > spec. I will share it with you separately as well.
> > > > >
> > > > > >
> > > > > > Unfortunately we've yet to come up with a way to handle the
> > > > > > non-cacheable mappings without introducing a degree of
> > > > > > vendor-specific behavior or seriously impacting performance
> > > > > > (mark them as not valid and deal with them in the trap handler).
> > > > > > I'm not really sure it counts as supporting the hardware if it's
> > > > > > massively slow, so that really leaves us with vendor-specific
> > > > > > mappings as the only
> > > > option to make these chips work.
> > > > >
> > > > > A RISC-V platform can have non-cacheable mappings is following
> > > > > possible
> > > > > ways:
> > > > > 1) Fixed physical address range as non-cacheable using PMAs
> > > > > 2) Custom page table attributes
> > > > > 3) Svpmbt extension being defined by RVI
> > > > >
> > > > > Atish and me both think it is possible to have RISC-V specific DMA
> > > > > ops implementation which can handle all above case. Atish is
> > > > > already working on DMA ops implementation for RISC-V.
> > > > Not only DMA ops, but also icache_sync & __vdso_icache_sync. Please
> > > > have a look at:
> > > > https://lore.kernel.org/linux-riscv/1622970249-50770-12-git-send-ema
> > > > il-
> > > > [email protected]/T/#u
> > >
> > > The icache_sync and __vdso_icache_sync will have to be addressed
> > > differently. The SBI DMA sync extension cannot address this.
> > Agree
> >
> > >
> > > It seems Allwinner D1 have more non-standard stuff:
> > > 1) Custom PTE bits for IO-coherent access
> > > 2) Custom data cache flush/invalidate for DMA sync
> > > 3) Custom icache flush/invalidate
> > Yes, but 3) is a performance optimization, not critical for running.
> >
> > >
> > > Other hand, BeagleV has only two problems:
> > > 1) Custom physical address range for IO-coherent access
> > > 2) Custom L2 cache flush/invalidate for DMA sync
> > https://github.com/starfive-
> > tech/linux/commit/d4c4044c08134dca8e5eaaeb6d3faf97dc453b6d
> >
> > Currently, they still use DMA sync with DMA descriptor, are you sure they
> > have minor memory physical address.
> >
> > >
> > > From above #2, can be solved by SBI DMA sync call and Linux DMA ops
> > > for both BeagleV and Allwinner D1
> > >
> > > On BeagleV, issue #1 can be solved using "dma-ranges".
> > >
> > > On Allwinner D1, issues #1 and #3 need to be addressed separately.
> > >
> > > I think supporting BeagleV in upstream Linux is relatively easy
> > > compared to Allwinner D1.
> > >
> > > @Guo, please check if you can reserve dedicated physical address range
> > > for IO-coherent access (just like BeagleV). If yes, then we can tackle
> > > issue #1 for Allwinner
> > > D1 using "dma-ranges" DT property.
> > There is no dedicated physical address range for IO-coherent access in D1. But
> > the solution you mentioned couldn't solve all requirements.
> > Only one mirror physical address range is not enough, we need at least three
> > (Normal, DMA desc, frame buffer).
>
> How many non-coherent devices you really have?
>
> I am guess lot of critical devices on Allwinner D1 are not coherent with CPU.
> The problem for Allwinner D1 is even worst than I thought. If such critical
> high through-put devices are not cache coherent with CPU then I am
> speechless about Allwinner D1 situation.
>
> > And that will triple the memory physical address which can't be accepted by
> > our users from the hardware design cost view.
> >
> > "dma-ranges" DT property is a big early MIPS smell. ARM SOC users can't
> > accept it. (They just say replace the CPU, but don't touch anything other.)
> >
> > PTE attributes are the non-coherent solution for many years. MIPS also
> > follows that now:
> > ref arch/mips/include/asm/pgtable-bits.h &
> > arch/mips/include/asm/pgtable.h
>
> RISC-V is in the process of standardizing Svpmbt extension.
>
> Unfortunately, the higher order bits which your implementation uses is
> not for SoC vendor use as-per the RISC-V privilege spec.
>
> >
> > #ifndef _CACHE_CACHABLE_NO_WA
> > #define _CACHE_CACHABLE_NO_WA (0<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_WA
> > #define _CACHE_CACHABLE_WA (1<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_UNCACHED
> > #define _CACHE_UNCACHED (2<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_NONCOHERENT
> > #define _CACHE_CACHABLE_NONCOHERENT (3<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_CE
> > #define _CACHE_CACHABLE_CE (4<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_COW
> > #define _CACHE_CACHABLE_COW (5<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_CACHABLE_CUW
> > #define _CACHE_CACHABLE_CUW (6<<_CACHE_SHIFT)
> > #endif
> > #ifndef _CACHE_UNCACHED_ACCELERATED
> > #define _CACHE_UNCACHED_ACCELERATED (7<<_CACHE_SHIFT)
> >
> > We can't force our users to double/triplicate their physical memory regions.
>
> We are trying to find a workable solution here so that we don't have
> to deal with custom PTE attributes which are reserved for RISC-V priv
> specification only.
How do think about my new patch of custom PTE attributes?
https://lore.kernel.org/linux-riscv/[email protected]/T/#mdc0dacba57346b5ac59a01961495c132b93cfcdb

>
> Regards,
> Anup
>
> >
> > >
> > > Regards,
> > > Anup
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > This implementation, which adds some Kconfig entries that
> > > > > > control page table bits, definately isn't suitable for upstream.
> > > > > > Allowing users to set arbitrary page table bits will eventually
> > > > > > conflict with the standard, and is just going to be a mess.
> > > > > > It'll also lead to kernels that are only compatible with
> > > > > > specific designs, which we're trying very hard to avoid. At a
> > > > > > bare minimum we'll need some way to detect systems with these
> > > > > > page table bits before setting them, and some description of
> > > > > > what the bits actually do so we can reason about
> > > > them.
> > > > >
> > > > > Yes, vendor specific Kconfig options are strict NO NO. We can't
> > > > > give-up the goal of unified kernel image for all platforms.
> > > > >
> > > > > Regards,
> > > > > Anup
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > > Guo Ren
> > > >
> > > > ML: https://lore.kernel.org/linux-csky/
> >
> >
> >
> > --
> > Best Regards
> > Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 06:28:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote:
> >From Linux non-coherency view, we need:
> - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors
> - Non-cache + weak order to deal with framebuffer drivers
> - CMO dma_sync to sync cache with DMA devices

This is not strictly true. At the very minimum you only need cache
invalidation and writeback instructions. For example early parisc
CPUs and some m68knommu SOCs have no support for uncached areas at all,
and Linux works. But to be fair this is very painful and supports only
very limited periphals. So for modern full Linux support some uncahed
memory is advisable. But that doesn't have to be using PTE attributes.
It could also be physical memory regions that are either totally fixed
or somewhat dynamic.

2021-06-07 06:42:54

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 2:27 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote:
> > >From Linux non-coherency view, we need:
> > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors
> > - Non-cache + weak order to deal with framebuffer drivers
> > - CMO dma_sync to sync cache with DMA devices
>
> This is not strictly true. At the very minimum you only need cache
> invalidation and writeback instructions. For example early parisc
> CPUs and some m68knommu SOCs have no support for uncached areas at all,
> and Linux works. But to be fair this is very painful and supports only
> very limited periphals. So for modern full Linux support some uncahed
> memory is advisable. But that doesn't have to be using PTE attributes.
> It could also be physical memory regions that are either totally fixed
Double/Triple the size of physical memory regions can't be accepted by
SOC vendors, because it wastes HW resources.
Some cost-down soc interconnects only have 32bit~34bit width of
physical address, are you sure you could force them to expand it? (I
can't)

> or somewhat dynamic.
How can HW implement with dynamic modifying PMA? What's the granularity?



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 06:53:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 07, 2021 at 02:41:14PM +0800, Guo Ren wrote:
> Double/Triple the size of physical memory regions can't be accepted by
> SOC vendors, because it wastes HW resources.
> Some cost-down soc interconnects only have 32bit~34bit width of
> physical address, are you sure you could force them to expand it? (I
> can't)
>
> > or somewhat dynamic.
> How can HW implement with dynamic modifying PMA? What's the granularity?

I'm just stating the requirements from the Linux DMA perspective. You
also do not need tripple the address space, just double.

2021-06-07 07:48:28

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 2:51 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Jun 07, 2021 at 02:41:14PM +0800, Guo Ren wrote:
> > Double/Triple the size of physical memory regions can't be accepted by
> > SOC vendors, because it wastes HW resources.
> > Some cost-down soc interconnects only have 32bit~34bit width of
> > physical address, are you sure you could force them to expand it? (I
> > can't)
> >
> > > or somewhat dynamic.
> > How can HW implement with dynamic modifying PMA? What's the granularity?
>
> I'm just stating the requirements from the Linux DMA perspective. You
> also do not need tripple the address space, just double.

With double, you only got "strong order + non-cache" for the DMA
descriptor. How about write-combine scenario?

Even, double physical memory address space also wastes HW resources.




--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-07 08:37:47

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Στις 2021-06-07 06:19, Guo Ren έγραψε:
>> The C-bit was recently dropped, there is a new proposal for Page Based
>> Memory Attributes (PBMT) that we can work on / push for.
> C-bit still needs discussion, we shouldn't drop it directly.
>

You can always participate on the discussion on virtmem mailing list.

> Raise a page fault won't solve anything. We still need access to the
> page with proper performance.
>

The point is that future hw implementations will be required to return a
page fault in case we tamper with those reserved bits, they won't just
ignore them. Supporting custom values there means supporting
non-compliant implementations.

>
> We need PTEs to provide a non-coherency solution, and only CMO
> instructions are not enough. We can't modify so many Linux drivers to
> fit it.
> From Linux non-coherency view, we need:
> - Non-cache + Strong Order PTE attributes to deal with drivers' DMA
> descriptors
> - Non-cache + weak order to deal with framebuffer drivers
> - CMO dma_sync to sync cache with DMA devices
> - Userspace icache_sync solution, which prevents calls to S-mode with
> IPI fence.i. (Necessary to JIT java scenarios.)
>
> All above are not in spec, but the real chips are done.
> (Actually, these have been talked about for more than five years, we
> still haven't the uniform idea.)
>
> The idea of C-bit is really important for us which prevents our chips
> violates the spec.

Have you checked the PBMT proposal ? It defines (so far) the following
attributes that can be set on PTEs to override the PMAs of the
underlying physical memory:

Bits [62:61]
00 (WB) -> Cacheable, default ordering
01 (NC) -> Noncacheable, default ordering
10 (IO) -> Noncacheable, strong ordering

So it'll cover the use cases you mention.

2021-06-08 12:29:32

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Sat, Jun 5, 2021 at 12:12 AM Palmer Dabbelt <[email protected]> wrote:
>
> On Fri, 04 Jun 2021 07:47:22 PDT (-0700), [email protected] wrote:
> > Hi Arnd & Palmer,
> >
> > Sorry for the delayed reply, I'm working on the next version of the patch.
> >
> > On Fri, Jun 4, 2021 at 5:56 PM Arnd Bergmann <[email protected]> wrote:
> >>
> >> On Thu, Jun 3, 2021 at 5:39 PM Palmer Dabbelt <[email protected]> wrote:
> >> > On Wed, 02 Jun 2021 23:00:29 PDT (-0700), Anup Patel wrote:
> >> > >> This implementation, which adds some Kconfig entries that control page table
> >> > >> bits, definately isn't suitable for upstream. Allowing users to set arbitrary
> >> > >> page table bits will eventually conflict with the standard, and is just going to
> >> > >> be a mess. It'll also lead to kernels that are only compatible with specific
> >> > >> designs, which we're trying very hard to avoid. At a bare minimum we'll need
> >> > >> some way to detect systems with these page table bits before setting them,
> >> > >> and some description of what the bits actually do so we can reason about
> >> > >> them.
> >> > >
> >> > > Yes, vendor specific Kconfig options are strict NO NO. We can't give-up the
> >> > > goal of unified kernel image for all platforms.
> > Okay, Agree. Please help review the next version of the patch.
> >
> >> >
> >> > I think this is just a phrasing issue, but just to be sure:
> >> >
> >> > IMO it's not that they're vendor-specific Kconfig options, it's that
> >> > turning them on will conflict with standard systems (and other vendors).
> >> > We've already got the ability to select sets of Kconfig settings that
> >> > will only boot on one vendor's system, which is fine, as long as there
> >> > remains a set of Kconfig settings that will boot on all systems.
> >> >
> >> > An example here would be the errata: every system has errata of some
> >> > sort, so if we start flipping off various vendor's errata Kconfigs
> >> > you'll end up with kernels that only function properly on some systems.
> >> > That's fine with me, as long as it's possible to turn on all vendor's
> >> > errata Kconfigs at the same time and the resulting kernel functions
> >> > correctly on all systems.
> >>
> >> Yes, this is generally the goal, and it would be great to have that
> >> working in a way where a 'defconfig' build just turns on all the options
> >> that are needed to use any SoC specific features and drivers while
> >> still working on all hardware. There are however limits you may run
> >> into at some point, and other architectures usually only manage to span
> >> some 10 to 15 years of hardware implementations with a single
> >> kernel before it get really hard.
> > I could follow the goal in the next version of the patchset. Please
> > help review, thx.
>
> IMO we're essentially here now with the RISC-V stuff: defconfig flips on
> everything necesasry to boot normal-smelling SOCs, with everything being
> detected as the system boots. We have some wacky configurations like
> !MMU and XIP that are coupled to the hardware, but (and sorry for
> crossing the other threads, I missed your pointer as it's early here) as
> I said in the other thread it might be time to make it explicit that
> those things are non-portable.
>
> The hope here has always been that we'd have enough in the standards
> that we could avoid a proliferation of vendor-specific code. We've
> always put a strong "things keep working forever" stake in the ground in
> RISC-V land, but that's largely been because we were counting on the
> standards existing that make support easy. In practice we don't have
> those standards so we're ending up with a fairly large software base
> that is required to support everything. We don't have all that much
> hardware right now so we'll have to see how it goes, but for now I'm in
> favor of keeping defconfig as a "boots on everything" sort of setup --
> both because it makes life easier for users, and because it makes issues
> like the non-portable Kconfigs that showed up here quite explicit.
I reuse the Image header to pass vendor magic to init PTE attributes'
variable before setup_vm. Can you give me some feedback on that patch?
https://lore.kernel.org/linux-riscv/[email protected]/T/#mdc0dacba57346b5ac59a01961495c132b93cfcdb

>
> If we get to 10/15 years of hardware then I'm sure we'll be removing old
> systems from defconfig (or maybe even the kernel entirely, a lot of this
> stuff isn't in production). I'm just hoping we make it that far ;)
>
> >> To give some common examples that make it break down:
> >>
> >> - 32-bit vs 64-bit already violates that rule on risc-v (as it does on
> >> most other architectures)
>
> Yes, and there's no way around that on RISC-V. They're different base
> ISAs therefor re-define the same instructions, so we're essentially at
> two kernel binaries by that point. The platform spec says rv64gc, so we
> can kind of punt on this one for now. If rv32 hardware shows up
> we'll probably want a standard system there too, which is why we've
> avoided coupling kernel portability to XLEN.
>
> >> - architectures that support both big-endian and little-endian kernels
> >> tend to have platforms that require one or the other (e.g. mips,
> >> though not arm). Not an issue for you.
>
> It is now! We've added big-endian to RISC-V. There's no hardware yet
> and very little software support. IMO the right answer is to ban that
> from the platform spec, but again it'll depnd on what vendors want to
> build (though anyone is listening, please don't make my life miserable
> ;)).
>
> >> - page table formats are the main cause of incompatibility: arm32
> >> and x86-32 require three-level tables for certain features, but those
> >> are incompatible with older cores, arm64 supports three different
> >> page sizes, but none of them works on all cores (4KB almost works
> >> everywhere).
>
> We actually have some support on the works for multiple page table
> levels in a single binary, which should help with a lot of that
> incompatibility. I don't know of any plans to couple other page table
> features to the number of levels, though.
>
> >> - SMP-enabled ARMv7 kernels can be configured to run on either
> >> ARMv6 or ARMv8, but not both, in this case because of incompatible
> >> barrier instructions.
>
> Our barriers aren't quite split the same way, but we do have two memory
> models (RVWMO and TSO). IIUC we should be able to support both in the
> same kernels with some patching, but the resulting kernels would be
> biased towards one memory models over the other WRT performance. Again,
> we'll have to see what the vendors do and I'm hoping we don't end up
> with too many headaches.
>
> >> - 32-bit Arm has a couple more remaining features that require building
> >> a machine specific kernel if enabled because they hardcode physical
> >> addresses: early printk (debug_ll, not the normal earlycon), NOMMU,
> >> and XIP.
>
> We've got NOMMU and XIP as well, but we have some SBI support for early
> printk. IMO we're not really sure if we've decoupled all the PA layout
> dependencies yet from Linux, as we really only support one vendor's
> systems, but we've had a lot of work lately on beefing up our memory
> layout so with any luck we'll be able to quickly sort out anything that
> comes up.



--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-08 15:36:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Tue, Jun 08, 2021 at 03:00:17PM +0000, David Laight wrote:
> It is almost impossible to interface to many ethernet chips without
> either coherent or uncached memory for the descriptor rings.
> The status bits on the transmit ring are particularly problematic.
>
> The receive ring can be done with writeback+invalidate provided you
> fill a cache line at a time.

It is horrible, but it has been done. Take a look at:

drivers/net/ethernet/i825xx/lasi_82596.c and
drivers/net/ethernet/seeq/sgiseeq.c

2021-06-09 03:57:00

by David Laight

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: Christoph Hellwig
> Sent: 07 June 2021 07:27
>
> On Mon, Jun 07, 2021 at 11:19:03AM +0800, Guo Ren wrote:
> > >From Linux non-coherency view, we need:
> > - Non-cache + Strong Order PTE attributes to deal with drivers' DMA descriptors
> > - Non-cache + weak order to deal with framebuffer drivers
> > - CMO dma_sync to sync cache with DMA devices
>
> This is not strictly true. At the very minimum you only need cache
> invalidation and writeback instructions. For example early parisc
> CPUs and some m68knommu SOCs have no support for uncached areas at all,
> and Linux works. But to be fair this is very painful and supports only
> very limited periphals. So for modern full Linux support some uncahed
> memory is advisable. But that doesn't have to be using PTE attributes.
> It could also be physical memory regions that are either totally fixed
> or somewhat dynamic.

It is almost impossible to interface to many ethernet chips without
either coherent or uncached memory for the descriptor rings.
The status bits on the transmit ring are particularly problematic.

The receive ring can be done with writeback+invalidate provided you
fill a cache line at a time.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-06-09 05:54:35

by David Laight

[permalink] [raw]
Subject: RE: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

From: 'Christoph Hellwig'
> Sent: 08 June 2021 16:32
>
> On Tue, Jun 08, 2021 at 03:00:17PM +0000, David Laight wrote:
> > It is almost impossible to interface to many ethernet chips without
> > either coherent or uncached memory for the descriptor rings.
> > The status bits on the transmit ring are particularly problematic.
> >
> > The receive ring can be done with writeback+invalidate provided you
> > fill a cache line at a time.
>
> It is horrible, but it has been done. Take a look at:
>
> drivers/net/ethernet/i825xx/lasi_82596.c and
> drivers/net/ethernet/seeq/sgiseeq.c

I guess that each transmit has to be split into enough
fragments that they fill a cache line.
That won't work with some (probably old now) devices that
require the first fragment to be 64 bytes because it won't
back-up the descriptors after a collision.

It's all as horrid as a DSP we have that can't receive ethernet
frames onto a 4n+2 boundary and doesn't support misaligned accesses.

Mind you, Sun's original Sbus ethernet board had to be given
a 4n aligned rx buffer and then a misaligned copy done in kernel
in order to not drop packets!

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-06-09 13:05:00

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, 9 Jun 2021 11:28:19 +0800
Guo Ren <[email protected]> wrote:


>
>
> On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]> wrote:
> >
> > Στις 2021-05-20 04:45, Guo Ren έγραψε:
> > > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
> > >>
> > >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> > >> > This patch series looks like it might be useful for the StarFive JH7100
> > >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> > >> > USB and SDIO require that the L2 cache must be manually flushed after
> > >> > DMA operations if the data is intended to be shared with U74 cores [2].
> > >>
> > >> Not too much, given that the SiFive lineage CPUs have an uncached
> > >> window, that is a totally different way to allocate uncached memory.
> > > It's a very big MIPS smell. What's the attribute of the uncached
> > > window? (uncached + strong-order/ uncached + weak, most vendors still
> > > use AXI interconnect, how to deal with a bufferable attribute?) In
> > > fact, customers' drivers use different ways to deal with DMA memory in
> > > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> > > the same way in DMA memory is a smart choice. So using PTE attributes
> > > is more suitable.
> > >

<snip>

> > > 4.4.1
> > > The draft supports custom attribute bits in PTE.
> > >
> >
> > Not only it doesn't support custom attributes on PTEs:
> >
> > "Bits63–54 are reserved for future standard use and must be zeroed by
> > software for forward compatibility."
> >
> > It also goes further to say that:
> >
> > "if any of these bits are set, a page-fault exception is raised"
> Agree, when our processor's mmu works in compatible mmu, we must keep
> "Bits63–54 bit" zero in Linux.
> So, I think this is the first version of the PTE format.
>
> If the "PBMT" extension proposal is approved, it will cause the second
> version of the PTE format.
>
> Maybe in the future, we'll get more versions of the PTE formats.
>
> So, seems Linux must support multi versions of PTE formats with one
> Image, right?
>
> Okay, we could stop arguing with the D1 PTE format. And talk about how
> to let Linux support multi versions of PTE formats that come from the
> future RISC-V privilege spec.
>

Just my humble opinion:
When those bits(63~54) usage are standardized in future RISC-V privilege spec
generic Image can still be supported with the following solutions:

*alternative patch only fly:
If the bit is only need to be set during init, we may insert nop instruction(s)
at proper place, then patch the nop into set_the_target_bit instruction(s) by
hart's feature.

*normal check feature then use:
If the feature needs a bit complex code, we could go through the "feature check
then use". static key tech can be used here to avoid branches.

2021-06-09 17:05:59

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]> wrote:
>
> Στις 2021-05-20 04:45, Guo Ren έγραψε:
> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
> >>
> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> >> > This patch series looks like it might be useful for the StarFive JH7100
> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> >> > USB and SDIO require that the L2 cache must be manually flushed after
> >> > DMA operations if the data is intended to be shared with U74 cores [2].
> >>
> >> Not too much, given that the SiFive lineage CPUs have an uncached
> >> window, that is a totally different way to allocate uncached memory.
> > It's a very big MIPS smell. What's the attribute of the uncached
> > window? (uncached + strong-order/ uncached + weak, most vendors still
> > use AXI interconnect, how to deal with a bufferable attribute?) In
> > fact, customers' drivers use different ways to deal with DMA memory in
> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> > the same way in DMA memory is a smart choice. So using PTE attributes
> > is more suitable.
> >
> > See:
> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
> > 4.4.1
> > The draft supports custom attribute bits in PTE.
> >
>
> Not only it doesn't support custom attributes on PTEs:
>
> "Bits63–54 are reserved for future standard use and must be zeroed by
> software for forward compatibility."
>
> It also goes further to say that:
>
> "if any of these bits are set, a page-fault exception is raised"
Agree, when our processor's mmu works in compatible mmu, we must keep
"Bits63–54 bit" zero in Linux.
So, I think this is the first version of the PTE format.

If the "PBMT" extension proposal is approved, it will cause the second
version of the PTE format.

Maybe in the future, we'll get more versions of the PTE formats.

So, seems Linux must support multi versions of PTE formats with one
Image, right?

Okay, we could stop arguing with the D1 PTE format. And talk about how
to let Linux support multi versions of PTE formats that come from the
future RISC-V privilege spec.

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/

2021-06-09 17:17:18

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

Στις 2021-06-09 06:28, Guo Ren έγραψε:
> On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]>
> wrote:
>>
>> Στις 2021-05-20 04:45, Guo Ren έγραψε:
>> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
>> >>
>> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
>> >> > This patch series looks like it might be useful for the StarFive JH7100
>> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
>> >> > USB and SDIO require that the L2 cache must be manually flushed after
>> >> > DMA operations if the data is intended to be shared with U74 cores [2].
>> >>
>> >> Not too much, given that the SiFive lineage CPUs have an uncached
>> >> window, that is a totally different way to allocate uncached memory.
>> > It's a very big MIPS smell. What's the attribute of the uncached
>> > window? (uncached + strong-order/ uncached + weak, most vendors still
>> > use AXI interconnect, how to deal with a bufferable attribute?) In
>> > fact, customers' drivers use different ways to deal with DMA memory in
>> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
>> > the same way in DMA memory is a smart choice. So using PTE attributes
>> > is more suitable.
>> >
>> > See:
>> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
>> > 4.4.1
>> > The draft supports custom attribute bits in PTE.
>> >
>>
>> Not only it doesn't support custom attributes on PTEs:
>>
>> "Bits63–54 are reserved for future standard use and must be zeroed by
>> software for forward compatibility."
>>
>> It also goes further to say that:
>>
>> "if any of these bits are set, a page-fault exception is raised"
> Agree, when our processor's mmu works in compatible mmu, we must keep
> "Bits63–54 bit" zero in Linux.
> So, I think this is the first version of the PTE format.
>
> If the "PBMT" extension proposal is approved, it will cause the second
> version of the PTE format.
>
> Maybe in the future, we'll get more versions of the PTE formats.
>
> So, seems Linux must support multi versions of PTE formats with one
> Image, right?
>
> Okay, we could stop arguing with the D1 PTE format. And talk about how
> to let Linux support multi versions of PTE formats that come from the
> future RISC-V privilege spec.

The RISC-V ISA specs are meant to be backwards compatible, so newer PTE
versions should work on older devices (note that the spec says that
software must set those bits to zero for "forward compatibility" and are
"reserved for future use" so current implementations must ignore them).
Obviously the proposed "if any of these bits are set, a page-fault
exception is raised" will break backwards compatibility which is why we
need to ask for it to be removed from the draft.

As an example the PBMT proposal uses bits 62:61 that on older hw should
be ignored ("reserved for future use"), if Linux uses those bits we
won't need a different code path for supporting older hw/older PTE
versions, we'll just set them and older hw will ignore them. Because of
the guarantee that ISA specs maintain backwards compatibility, the
functionality of bits 62:61 is guaranteed to remain backwards
compatible.

In other words we don't need any special handling of multiple PTE
formats, we just need to support the latest Priv. Spec and the Spec
itself will guarantee backwards compatibility.

2021-06-09 17:26:20

by Guo Ren

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] riscv: Add DMA_COHERENT support

On Wed, Jun 9, 2021 at 5:45 PM Nick Kossifidis <[email protected]> wrote:
>
> Στις 2021-06-09 06:28, Guo Ren έγραψε:
> > On Mon, Jun 7, 2021 at 2:14 AM Nick Kossifidis <[email protected]>
> > wrote:
> >>
> >> Στις 2021-05-20 04:45, Guo Ren έγραψε:
> >> > On Wed, May 19, 2021 at 2:53 PM Christoph Hellwig <[email protected]> wrote:
> >> >>
> >> >> On Tue, May 18, 2021 at 11:44:35PM -0700, Drew Fustini wrote:
> >> >> > This patch series looks like it might be useful for the StarFive JH7100
> >> >> > [1] [2] too as it has peripherals on a non-coherent interconnect. GMAC,
> >> >> > USB and SDIO require that the L2 cache must be manually flushed after
> >> >> > DMA operations if the data is intended to be shared with U74 cores [2].
> >> >>
> >> >> Not too much, given that the SiFive lineage CPUs have an uncached
> >> >> window, that is a totally different way to allocate uncached memory.
> >> > It's a very big MIPS smell. What's the attribute of the uncached
> >> > window? (uncached + strong-order/ uncached + weak, most vendors still
> >> > use AXI interconnect, how to deal with a bufferable attribute?) In
> >> > fact, customers' drivers use different ways to deal with DMA memory in
> >> > non-coherent SOC. Most riscv SOC vendors are from ARM, so giving them
> >> > the same way in DMA memory is a smart choice. So using PTE attributes
> >> > is more suitable.
> >> >
> >> > See:
> >> > https://github.com/riscv/virtual-memory/blob/main/specs/611-virtual-memory-diff.pdf
> >> > 4.4.1
> >> > The draft supports custom attribute bits in PTE.
> >> >
> >>
> >> Not only it doesn't support custom attributes on PTEs:
> >>
> >> "Bits63–54 are reserved for future standard use and must be zeroed by
> >> software for forward compatibility."
> >>
> >> It also goes further to say that:
> >>
> >> "if any of these bits are set, a page-fault exception is raised"
> > Agree, when our processor's mmu works in compatible mmu, we must keep
> > "Bits63–54 bit" zero in Linux.
> > So, I think this is the first version of the PTE format.
> >
> > If the "PBMT" extension proposal is approved, it will cause the second
> > version of the PTE format.
> >
> > Maybe in the future, we'll get more versions of the PTE formats.
> >
> > So, seems Linux must support multi versions of PTE formats with one
> > Image, right?
> >
> > Okay, we could stop arguing with the D1 PTE format. And talk about how
> > to let Linux support multi versions of PTE formats that come from the
> > future RISC-V privilege spec.
>
> The RISC-V ISA specs are meant to be backwards compatible, so newer PTE
> versions should work on older devices (note that the spec says that
> software must set those bits to zero for "forward compatibility" and are
> "reserved for future use" so current implementations must ignore them).
> Obviously the proposed "if any of these bits are set, a page-fault
> exception is raised" will break backwards compatibility which is why we
> need to ask for it to be removed from the draft.
>
> As an example the PBMT proposal uses bits 62:61 that on older hw should
> be ignored ("reserved for future use"), if Linux uses those bits we
> won't need a different code path for supporting older hw/older PTE
> versions, we'll just set them and older hw will ignore them. Because of
> the guarantee that ISA specs maintain backwards compatibility, the
> functionality of bits 62:61 is guaranteed to remain backwards
> compatible.
the spec says that software must set those bits to zero for "forward
compatibility". So how older hw ignore them?
If an older hw follow the current spec requires software to set those
bits to zero, how we put any PBMT bits without different Linux PTE
formats?

>
> In other words we don't need any special handling of multiple PTE
> formats, we just need to support the latest Priv. Spec and the Spec
> itself will guarantee backwards compatibility.
Nak, totally no Logically self-consistent.

--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/