2012-10-24 08:04:30

by Gregory CLEMENT

[permalink] [raw]
Subject: [PATCH 0/2] Add hardware I/O coherency support for Armada 370/XP

Hello,

The purpose of this patch set is to add hardware I/O Coherency support
for Armada 370 and Armada XP. Theses SoCs come with an unit called
coherency fabric. A beginning of the support for this unit have been
introduced with the SMP patch set. This series extend this support:
the coherency fabric unit allows to use the Armada XP and the Armada
370 as nearly coherent architectures.

The second patches enables this new feature and register our own set
of DMA ops, to benefit this hardware enhancement.

The first patch introduces a new flag for the address decoding
configuration in order to be able to set the memory windows as
shared memory.

This series depend on the SMP patch set posted on Monday
(http://thread.gmane.org/gmane.linux.ports.arm.kernel/194901).

Regards,

Gregory

Gregory CLEMENT (2):
arm: plat-orion: Add coherency attribute when setup mbus target
arm: mvebu: Add hardware I/O Coherency support

arch/arm/boot/dts/armada-370-xp.dtsi | 3 +-
arch/arm/mach-mvebu/addr-map.c | 3 +
arch/arm/mach-mvebu/armada-370-xp.c | 1 +
arch/arm/mach-mvebu/coherency.c | 87 +++++++++++++++++++++++++++
arch/arm/mach-mvebu/common.h | 2 +
arch/arm/plat-orion/addr-map.c | 4 ++
arch/arm/plat-orion/include/plat/addr-map.h | 1 +
7 files changed, 100 insertions(+), 1 deletion(-)

--
1.7.9.5


2012-10-24 08:04:33

by Gregory CLEMENT

[permalink] [raw]
Subject: [PATCH 1/2] arm: plat-orion: Add coherency attribute when setup mbus target

Recent SoC such as Armada 370/XP came with the possibility to deal
with the I/O coherency by hardware. In this case the transaction
attribute of the window must be flagged as "Shared transaction". Once
this flag is set, then the transactions will be forced to be sent
through the coherency block, in other case transaction is driven
directly to DRAM.

Signed-off-by: Gregory CLEMENT <[email protected]>
Reviewed-by: Yehuda Yitschak <[email protected]>
---
arch/arm/plat-orion/addr-map.c | 4 ++++
arch/arm/plat-orion/include/plat/addr-map.h | 1 +
2 files changed, 5 insertions(+)

diff --git a/arch/arm/plat-orion/addr-map.c b/arch/arm/plat-orion/addr-map.c
index a7b8060..febe386 100644
--- a/arch/arm/plat-orion/addr-map.c
+++ b/arch/arm/plat-orion/addr-map.c
@@ -42,6 +42,8 @@ EXPORT_SYMBOL_GPL(mv_mbus_dram_info);
#define WIN_REMAP_LO_OFF 0x0008
#define WIN_REMAP_HI_OFF 0x000c

+#define ATTR_HW_COHERENCY (0x1 << 4)
+
/*
* Default implementation
*/
@@ -163,6 +165,8 @@ void __init orion_setup_cpu_mbus_target(const struct orion_addr_map_cfg *cfg,
w = &orion_mbus_dram_info.cs[cs++];
w->cs_index = i;
w->mbus_attr = 0xf & ~(1 << i);
+ if (cfg->hw_io_coherency)
+ w->mbus_attr |= ATTR_HW_COHERENCY;
w->base = base & 0xffff0000;
w->size = (size | 0x0000ffff) + 1;
}
diff --git a/arch/arm/plat-orion/include/plat/addr-map.h b/arch/arm/plat-orion/include/plat/addr-map.h
index ec63e4a..b76c065 100644
--- a/arch/arm/plat-orion/include/plat/addr-map.h
+++ b/arch/arm/plat-orion/include/plat/addr-map.h
@@ -17,6 +17,7 @@ struct orion_addr_map_cfg {
const int num_wins; /* Total number of windows */
const int remappable_wins;
void __iomem *bridge_virt_base;
+ int hw_io_coherency;

/* If NULL, the default cpu_win_can_remap will be used, using
the value in remappable_wins */
--
1.7.9.5

2012-10-24 08:04:42

by Gregory CLEMENT

[permalink] [raw]
Subject: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

Armada 370 and XP come with an unit called coherency fabric. This unit
allows to use the Armada XP as a nearly coherent architecture. The
coherency mechanism uses snoop filters to ensure the coherency between
caches, DRAM and devices. This mechanism needs a synchronization
barrier which guarantees that all memory write initiated by the
devices has reached their target and do not reside in intermediate
write buffers. That's why the architecture is not totally coherent and
we need to provide our own functions for some DMA operations.

Beside the use of the coherency fabric, the device units will have to
set the attribute flag to select the accurate coherency process for
the memory transaction. This is done each device driver programs the
DRAM address windows. The value of the attribute set by the driver is
retrieved through the orion_addr_map_cfg struct filled during the
early initialization of the platform.

Signed-off-by: Gregory CLEMENT <[email protected]>
Reviewed-by: Yehuda Yitschak <[email protected]>
---
arch/arm/boot/dts/armada-370-xp.dtsi | 3 +-
arch/arm/mach-mvebu/addr-map.c | 3 ++
arch/arm/mach-mvebu/armada-370-xp.c | 1 +
arch/arm/mach-mvebu/coherency.c | 87 ++++++++++++++++++++++++++++++++++
arch/arm/mach-mvebu/common.h | 2 +
5 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
index 18ba60b..af22e53 100644
--- a/arch/arm/boot/dts/armada-370-xp.dtsi
+++ b/arch/arm/boot/dts/armada-370-xp.dtsi
@@ -38,7 +38,8 @@

coherency-fabric@d0020200 {
compatible = "marvell,coherency-fabric";
- reg = <0xd0020200 0xb0>;
+ reg = <0xd0020200 0xb0>,
+ <0xd0021010 0x1c>;
};

soc {
diff --git a/arch/arm/mach-mvebu/addr-map.c b/arch/arm/mach-mvebu/addr-map.c
index fe454a4..595f6b7 100644
--- a/arch/arm/mach-mvebu/addr-map.c
+++ b/arch/arm/mach-mvebu/addr-map.c
@@ -108,6 +108,9 @@ static int __init armada_setup_cpu_mbus(void)

addr_map_cfg.bridge_virt_base = mbus_unit_addr_decoding_base;

+ if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
+ addr_map_cfg.hw_io_coherency = 1;
+
/*
* Disable, clear and configure windows.
*/
diff --git a/arch/arm/mach-mvebu/armada-370-xp.c b/arch/arm/mach-mvebu/armada-370-xp.c
index 41431a1..3517f7d 100644
--- a/arch/arm/mach-mvebu/armada-370-xp.c
+++ b/arch/arm/mach-mvebu/armada-370-xp.c
@@ -49,6 +49,7 @@ struct sys_timer armada_370_xp_timer = {

static void __init armada_370_xp_dt_init(void)
{
+ armada_370_xp_coherency_iocache_init();
of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
}

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 71e27ba..a596ca9 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -22,6 +22,10 @@
#include <linux/of_address.h>
#include <linux/io.h>
#include <linux/smp.h>
+#include <linux/dma-mapping.h>
+#include <linux/platform_device.h>
+#include <asm/smp_plat.h>
+
#include "armada-370-xp.h"

/* Some functions in this file are called very early during SMP
@@ -31,16 +35,53 @@
* value matching its virtual mapping
*/
static void __iomem *coherency_base = ARMADA_370_XP_REGS_VIRT_BASE + 0x20200;
+static void __iomem *coherency_cpu_base;
+
+struct dma_map_ops armada_xp_dma_ops;

/* Coherency fabric registers */
#define COHERENCY_FABRIC_CTL_OFFSET 0x0
#define COHERENCY_FABRIC_CFG_OFFSET 0x4

+#define IO_SYNC_BARRIER_CTL_OFFSET 0x0
+
static struct of_device_id of_coherency_table[] = {
{.compatible = "marvell,coherency-fabric"},
{ /* end of list */ },
};

+static inline void armada_xp_sync_io_barrier(void)
+{
+ writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
+ while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET) & 0x1);
+}
+
+dma_addr_t armada_xp_dma_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ if (dir != DMA_TO_DEVICE)
+ armada_xp_sync_io_barrier();
+ return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+}
+
+
+void armada_xp_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+ size_t size, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+ if (dir != DMA_TO_DEVICE)
+ armada_xp_sync_io_barrier();
+}
+
+void armada_xp_dma_sync(struct device *dev, dma_addr_t dma_handle,
+ size_t size, enum dma_data_direction dir)
+{
+ if (dir != DMA_TO_DEVICE)
+ armada_xp_sync_io_barrier();
+}
+
int armada_xp_get_cpu_count(void)
{
int reg, cnt;
@@ -74,6 +115,42 @@ int armada_370_xp_set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
return 0;
}

+static int armada_xp_platform_notifier(struct notifier_block *nb,
+ unsigned long event, void *__dev)
+{
+ struct device *dev = __dev;
+
+ if (event != BUS_NOTIFY_ADD_DEVICE)
+ return NOTIFY_DONE;
+ set_dma_ops(dev, &armada_xp_dma_ops);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block armada_xp_platform_nb = {
+ .notifier_call = armada_xp_platform_notifier,
+};
+
+void __init armada_370_xp_coherency_iocache_init(void)
+{
+ /* When the coherency fabric is available, the Armada XP and
+ * Aramada 370 are close to a coherent architecture, so we based
+ * our dma ops on the coherent one, and just changes the
+ * operations which need a arch io sync */
+ if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
+ struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
+ memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
+ dma_ops->map_page = armada_xp_dma_map_page;
+ dma_ops->unmap_page = armada_xp_dma_unmap_page;
+ dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
+ dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
+ dma_ops->sync_single_for_device = armada_xp_dma_sync;
+ dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
+ dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
+ }
+ bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);
+}
+
int __init armada_370_xp_coherency_init(void)
{
struct device_node *np;
@@ -82,7 +159,17 @@ int __init armada_370_xp_coherency_init(void)
if (np) {
pr_info("Initializing Coherency fabric\n");
coherency_base = of_iomap(np, 0);
+ coherency_cpu_base = of_iomap(np, 1);
+ }
+#ifndef CONFIG_SMP
+ if (coherency_base) {
+ /* In UP case, cpu coherency is enabled here, in SMP case cpu
+ * coherency is enabled for each CPU by
+ * armada_xp_smp_prepare_cpus() in platsmp.c */
+ int hw_cpuid = cpu_logical_map(smp_processor_id());
+ armada_370_xp_set_cpu_coherent(hw_cpuid, 0);
}
+#endif

return 0;
}
diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-mvebu/common.h
index 86484bb..fff952e 100644
--- a/arch/arm/mach-mvebu/common.h
+++ b/arch/arm/mach-mvebu/common.h
@@ -23,6 +23,8 @@ void armada_370_xp_handle_irq(struct pt_regs *regs);

void armada_xp_cpu_die(unsigned int cpu);

+void armada_370_xp_coherency_iocache_init(void);
+
int armada_370_xp_coherency_init(void);
int armada_370_xp_pmsu_init(void);
void armada_xp_secondary_startup(void);
--
1.7.9.5

2012-10-24 08:17:12

by Yehuda Yitschak

[permalink] [raw]
Subject: RE: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support



> -----Original Message-----
> From: Gregory CLEMENT [mailto:[email protected]]
> Sent: Wednesday, October 24, 2012 10:04 AM
> To: Jason Cooper; Andrew Lunn; Gregory Clement
> Cc: [email protected]; Arnd Bergmann; Olof Johansson;
> Russell King; Rob Herring; Ben Dooks; Ian Molton; Nicolas Pitre; Lior
> Amsalem; Maen Suleiman; Tawfik Bayouk; Shadi Ammouri; Eran Ben-Avi;
> Yehuda Yitschak; Nadav Haklai; Ike Pan; Jani Monoses; Chris Van Hoof; Dan
> Frazier; Thomas Petazzoni; Leif Lindholm; Jon Masters; David Marlin;
> Sebastian Hesselbarth; [email protected]
> Subject: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support
>
> Armada 370 and XP come with an unit called coherency fabric. This unit
> allows to use the Armada XP as a nearly coherent architecture. The
> coherency mechanism uses snoop filters to ensure the coherency between
> caches, DRAM and devices. This mechanism needs a synchronization barrier
> which guarantees that all memory write initiated by the devices has
> reached their target and do not reside in intermediate write buffers. That's
> why the architecture is not totally coherent and we need to provide our
> own functions for some DMA operations.
>
> Beside the use of the coherency fabric, the device units will have to set the
> attribute flag to select the accurate coherency process for the memory
> transaction. This is done each device driver programs the DRAM address
> windows. The value of the attribute set by the driver is retrieved through
> the orion_addr_map_cfg struct filled during the early initialization of the
> platform.
>
> Signed-off-by: Gregory CLEMENT <[email protected]>
> Reviewed-by: Yehuda Yitschak <[email protected]>
> ---
> arch/arm/boot/dts/armada-370-xp.dtsi | 3 +-
> arch/arm/mach-mvebu/addr-map.c | 3 ++
> arch/arm/mach-mvebu/armada-370-xp.c | 1 +
> arch/arm/mach-mvebu/coherency.c | 87
> ++++++++++++++++++++++++++++++++++
> arch/arm/mach-mvebu/common.h | 2 +
> 5 files changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi
> b/arch/arm/boot/dts/armada-370-xp.dtsi
> index 18ba60b..af22e53 100644
> --- a/arch/arm/boot/dts/armada-370-xp.dtsi
> +++ b/arch/arm/boot/dts/armada-370-xp.dtsi
> @@ -38,7 +38,8 @@
>
> coherency-fabric@d0020200 {
> compatible = "marvell,coherency-fabric";
> - reg = <0xd0020200 0xb0>;
> + reg = <0xd0020200 0xb0>,
> + <0xd0021010 0x1c>;
> };
>
> soc {
> diff --git a/arch/arm/mach-mvebu/addr-map.c b/arch/arm/mach-
> mvebu/addr-map.c index fe454a4..595f6b7 100644
> --- a/arch/arm/mach-mvebu/addr-map.c
> +++ b/arch/arm/mach-mvebu/addr-map.c
> @@ -108,6 +108,9 @@ static int __init armada_setup_cpu_mbus(void)
>
> addr_map_cfg.bridge_virt_base = mbus_unit_addr_decoding_base;
>
> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-
> fabric"))
> + addr_map_cfg.hw_io_coherency = 1;
> +
> /*
> * Disable, clear and configure windows.
> */
> diff --git a/arch/arm/mach-mvebu/armada-370-xp.c b/arch/arm/mach-
> mvebu/armada-370-xp.c
> index 41431a1..3517f7d 100644
> --- a/arch/arm/mach-mvebu/armada-370-xp.c
> +++ b/arch/arm/mach-mvebu/armada-370-xp.c
> @@ -49,6 +49,7 @@ struct sys_timer armada_370_xp_timer = {
>
> static void __init armada_370_xp_dt_init(void) {
> + armada_370_xp_coherency_iocache_init();
> of_platform_populate(NULL, of_default_bus_match_table, NULL,
> NULL); }
>
> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-
> mvebu/coherency.c index 71e27ba..a596ca9 100644
> --- a/arch/arm/mach-mvebu/coherency.c
> +++ b/arch/arm/mach-mvebu/coherency.c
> @@ -22,6 +22,10 @@
> #include <linux/of_address.h>
> #include <linux/io.h>
> #include <linux/smp.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/platform_device.h>
> +#include <asm/smp_plat.h>
> +
> #include "armada-370-xp.h"
>
> /* Some functions in this file are called very early during SMP @@ -31,16
> +35,53 @@
> * value matching its virtual mapping
> */
> static void __iomem *coherency_base = ARMADA_370_XP_REGS_VIRT_BASE
> + 0x20200;
> +static void __iomem *coherency_cpu_base;
> +
> +struct dma_map_ops armada_xp_dma_ops;
>
> /* Coherency fabric registers */
> #define COHERENCY_FABRIC_CTL_OFFSET 0x0
> #define COHERENCY_FABRIC_CFG_OFFSET 0x4
>
> +#define IO_SYNC_BARRIER_CTL_OFFSET 0x0
> +
> static struct of_device_id of_coherency_table[] = {
> {.compatible = "marvell,coherency-fabric"},
> { /* end of list */ },
> };
>
> +static inline void armada_xp_sync_io_barrier(void) {
> + writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
> + while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET)
> & 0x1);
> +}
> +
> +dma_addr_t armada_xp_dma_map_page(struct device *dev, struct page
> *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir,
> + struct dma_attrs *attrs)
> +{
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> + return pfn_to_dma(dev, page_to_pfn(page)) + offset; }
> +
> +
> +void armada_xp_dma_unmap_page(struct device *dev, dma_addr_t
> dma_handle,
> + size_t size, enum dma_data_direction dir,
> + struct dma_attrs *attrs)
> +{
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> +}
> +
> +void armada_xp_dma_sync(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction dir) {
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> +}
> +

Shouldn't all the 4 functions above start with armada_370_xp and not armada_xp ?


> int armada_xp_get_cpu_count(void)

This function can be limited to CONFIG_SP

> {
> int reg, cnt;
> @@ -74,6 +115,42 @@ int armada_370_xp_set_cpu_coherent(unsigned int
> hw_cpu_id, int smp_group_id)
> return 0;
> }
>
> +static int armada_xp_platform_notifier(struct notifier_block *nb,
> + unsigned long event, void *__dev) {
> + struct device *dev = __dev;
> +
> + if (event != BUS_NOTIFY_ADD_DEVICE)
> + return NOTIFY_DONE;
> + set_dma_ops(dev, &armada_xp_dma_ops);
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block armada_xp_platform_nb = {
> + .notifier_call = armada_xp_platform_notifier, };
> +
> +void __init armada_370_xp_coherency_iocache_init(void)
> +{
> + /* When the coherency fabric is available, the Armada XP and
> + * Aramada 370 are close to a coherent architecture, so we based
> + * our dma ops on the coherent one, and just changes the
> + * operations which need a arch io sync */
> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-
> fabric")) {
> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
> + memcpy(dma_ops, &arm_coherent_dma_ops,
> sizeof(*dma_ops));
> + dma_ops->map_page = armada_xp_dma_map_page;
> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
> + dma_ops->sync_sg_for_cpu =
> arm_dma_ops.sync_sg_for_cpu;
> + dma_ops->sync_sg_for_device =
> arm_dma_ops.sync_sg_for_device;
> + }
> + bus_register_notifier(&platform_bus_type,
> &armada_xp_platform_nb); }
> +
> int __init armada_370_xp_coherency_init(void)
> {
> struct device_node *np;
> @@ -82,7 +159,17 @@ int __init armada_370_xp_coherency_init(void)
> if (np) {
> pr_info("Initializing Coherency fabric\n");
> coherency_base = of_iomap(np, 0);
> + coherency_cpu_base = of_iomap(np, 1);
> + }
> +#ifndef CONFIG_SMP
> + if (coherency_base) {
> + /* In UP case, cpu coherency is enabled here, in SMP case
> cpu
> + * coherency is enabled for each CPU by
> + * armada_xp_smp_prepare_cpus() in platsmp.c */
> + int hw_cpuid = cpu_logical_map(smp_processor_id());
> + armada_370_xp_set_cpu_coherent(hw_cpuid, 0);
> }
> +#endif
>
> return 0;
> }
> diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-
> mvebu/common.h index 86484bb..fff952e 100644
> --- a/arch/arm/mach-mvebu/common.h
> +++ b/arch/arm/mach-mvebu/common.h
> @@ -23,6 +23,8 @@ void armada_370_xp_handle_irq(struct pt_regs *regs);
>
> void armada_xp_cpu_die(unsigned int cpu);
>
> +void armada_370_xp_coherency_iocache_init(void);
> +
> int armada_370_xp_coherency_init(void);
> int armada_370_xp_pmsu_init(void);
> void armada_xp_secondary_startup(void);
> --
> 1.7.9.5

2012-10-24 11:36:43

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On Wednesday 24 October 2012, Gregory CLEMENT wrote:
> +void __init armada_370_xp_coherency_iocache_init(void)
> +{
> + /* When the coherency fabric is available, the Armada XP and
> + * Aramada 370 are close to a coherent architecture, so we based
> + * our dma ops on the coherent one, and just changes the
> + * operations which need a arch io sync */
> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
> + memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
> + dma_ops->map_page = armada_xp_dma_map_page;
> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
> + dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
> + dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
> + }
> + bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);

I think it would be cleaner to statically define the operations in a constant
structure and point directly to the functions you need. If necessary, use
multiple structures.

Arnd

2012-10-24 11:48:29

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On 10/24/2012 01:36 PM, Arnd Bergmann wrote:
> On Wednesday 24 October 2012, Gregory CLEMENT wrote:
>> +void __init armada_370_xp_coherency_iocache_init(void)
>> +{
>> + /* When the coherency fabric is available, the Armada XP and
>> + * Aramada 370 are close to a coherent architecture, so we based
>> + * our dma ops on the coherent one, and just changes the
>> + * operations which need a arch io sync */
>> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
>> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
>> + memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
>> + dma_ops->map_page = armada_xp_dma_map_page;
>> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
>> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
>> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
>> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
>> + dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
>> + dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
>> + }
>> + bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);
>
> I think it would be cleaner to statically define the operations in a constant
> structure and point directly to the functions you need. If necessary, use
> multiple structures.

My problem was that these functions are not exposed, only arm_dma_op and
arm_coherent_dma_ops are exported.
Or do you think about something like this:

struct dma_map_ops *dma_ops = {
.alloc = arm_dma_ops.arm_coherent_dma_alloc,
.free = arm_dma_ops.arm_coherent_dma_free,
.mmap = arm_dma_ops.arm_dma_mmap,
.get_sgtable = arm_dma_ops.arm_dma_get_sgtable,
.map_sg = arm_dma_ops.arm_dma_map_sg,
.set_dma_mask = arm_dma_ops.arm_dma_set_mask,
.map_page = armada_xp_dma_map_page,
.unmap_page = armada_xp_dma_unmap_page,
.unmap_sg = arm_dma_ops.unmap_sg,
.sync_single_for_cpu = armada_xp_dma_sync,
.sync_single_for_device = armada_xp_dma_sync,
.sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_ops.sync_sg_for_device,
};




>
> Arnd
>


--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 11:50:33

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On 10/24/2012 10:25 AM, Andrew Lunn wrote:
> On Wed, Oct 24, 2012 at 10:04:01AM +0200, Gregory CLEMENT wrote:
>> Armada 370 and XP come with an unit called coherency fabric. This unit
>> allows to use the Armada XP as a nearly coherent architecture. The
>> coherency mechanism uses snoop filters to ensure the coherency between
>> caches, DRAM and devices. This mechanism needs a synchronization
>> barrier which guarantees that all memory write initiated by the
>> devices has reached their target and do not reside in intermediate
>> write buffers. That's why the architecture is not totally coherent and
>> we need to provide our own functions for some DMA operations.
>>
>> Beside the use of the coherency fabric, the device units will have to
>> set the attribute flag to select the accurate coherency process for
>> the memory transaction. This is done each device driver programs the
>> DRAM address windows. The value of the attribute set by the driver is
>> retrieved through the orion_addr_map_cfg struct filled during the
>> early initialization of the platform.
>>
>> Signed-off-by: Gregory CLEMENT <[email protected]>
>> Reviewed-by: Yehuda Yitschak <[email protected]>
>> ---
>> arch/arm/boot/dts/armada-370-xp.dtsi | 3 +-
>> arch/arm/mach-mvebu/addr-map.c | 3 ++
>> arch/arm/mach-mvebu/armada-370-xp.c | 1 +
>> arch/arm/mach-mvebu/coherency.c | 87 ++++++++++++++++++++++++++++++++++
>> arch/arm/mach-mvebu/common.h | 2 +
>> 5 files changed, 95 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
>> index 18ba60b..af22e53 100644
>> --- a/arch/arm/boot/dts/armada-370-xp.dtsi
>> +++ b/arch/arm/boot/dts/armada-370-xp.dtsi
>> @@ -38,7 +38,8 @@
>>
>> coherency-fabric@d0020200 {
>> compatible = "marvell,coherency-fabric";
>> - reg = <0xd0020200 0xb0>;
>> + reg = <0xd0020200 0xb0>,
>> + <0xd0021010 0x1c>;
>> };
>
> ...
>
>> int __init armada_370_xp_coherency_init(void)
>> {
>> struct device_node *np;
>> @@ -82,7 +159,17 @@ int __init armada_370_xp_coherency_init(void)
>> if (np) {
>> pr_info("Initializing Coherency fabric\n");
>> coherency_base = of_iomap(np, 0);
>> + coherency_cpu_base = of_iomap(np, 1);
>
> Is this already in the binding documentation?

No indeed, the documentation should be completed. I will do it
the V2

>
> Thanks
> Andrew
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>


--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 11:53:48

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On 10/24/2012 01:48 PM, Gregory CLEMENT wrote:
> On 10/24/2012 01:36 PM, Arnd Bergmann wrote:
>> On Wednesday 24 October 2012, Gregory CLEMENT wrote:
>>> +void __init armada_370_xp_coherency_iocache_init(void)
>>> +{
>>> + /* When the coherency fabric is available, the Armada XP and
>>> + * Aramada 370 are close to a coherent architecture, so we based
>>> + * our dma ops on the coherent one, and just changes the
>>> + * operations which need a arch io sync */
>>> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
>>> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
>>> + memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
>>> + dma_ops->map_page = armada_xp_dma_map_page;
>>> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
>>> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
>>> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
>>> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
>>> + dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
>>> + dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
>>> + }
>>> + bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);
>>
>> I think it would be cleaner to statically define the operations in a constant
>> structure and point directly to the functions you need. If necessary, use
>> multiple structures.
>
> My problem was that these functions are not exposed, only arm_dma_op and
> arm_coherent_dma_ops are exported.
> Or do you think about something like this:
>
> struct dma_map_ops *dma_ops = {
> .alloc = arm_dma_ops.arm_coherent_dma_alloc,
> .free = arm_dma_ops.arm_coherent_dma_free,
> .mmap = arm_dma_ops.arm_dma_mmap,
> .get_sgtable = arm_dma_ops.arm_dma_get_sgtable,
> .map_sg = arm_dma_ops.arm_dma_map_sg,
> .set_dma_mask = arm_dma_ops.arm_dma_set_mask,
> .map_page = armada_xp_dma_map_page,
> .unmap_page = armada_xp_dma_unmap_page,
> .unmap_sg = arm_dma_ops.unmap_sg,
> .sync_single_for_cpu = armada_xp_dma_sync,
> .sync_single_for_device = armada_xp_dma_sync,
> .sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu,
> .sync_sg_for_device = arm_dma_ops.sync_sg_for_device,
> };

I was too fast to reply this struct is wrong, it should be as this one:

struct dma_map_ops *dma_ops = {
.alloc = arm_coherent_dma_ops.arm_coherent_dma_alloc,
.free = arm_coherent_dma_ops.arm_coherent_dma_free,
.mmap = arm_coherent_dma_ops.arm_dma_mmap,
.get_sgtable = arm_coherent_dma_ops.arm_dma_get_sgtable,
.map_sg = arm_coherent_dma_ops.arm_dma_map_sg,
.set_dma_mask = arm_coherent_dma_ops.arm_dma_set_mask,
.map_page = armada_xp_dma_map_page,
.unmap_page = armada_xp_dma_unmap_page,
.unmap_sg = arm_dma_ops.unmap_sg,
.sync_single_for_cpu = armada_xp_dma_sync,
.sync_single_for_device = armada_xp_dma_sync,
.sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_ops.sync_sg_for_device,
};

Thanks,
Gregory

2012-10-24 11:56:10

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 1/2] arm: plat-orion: Add coherency attribute when setup mbus target


On Wed, 24 Oct 2012 10:04:00 +0200, Gregory CLEMENT wrote:
> Recent SoC such as Armada 370/XP came with the possibility to deal
> with the I/O coherency by hardware. In this case the transaction
> attribute of the window must be flagged as "Shared transaction". Once
> this flag is set, then the transactions will be forced to be sent
> through the coherency block, in other case transaction is driven
> directly to DRAM.
>
> Signed-off-by: Gregory CLEMENT <[email protected]>
> Reviewed-by: Yehuda Yitschak <[email protected]>

Acked-by: Thomas Petazzoni <[email protected]>
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 12:25:07

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On Wednesday 24 October 2012, Gregory CLEMENT wrote:
> On 10/24/2012 01:48 PM, Gregory CLEMENT wrote:
> > On 10/24/2012 01:36 PM, Arnd Bergmann wrote:
> >>
> >> I think it would be cleaner to statically define the operations in a constant
> >> structure and point directly to the functions you need. If necessary, use
> >> multiple structures.
> >
> > My problem was that these functions are not exposed, only arm_dma_op and
> > arm_coherent_dma_ops are exported.
> > Or do you think about something like this:
>
> struct dma_map_ops *dma_ops = {
> .alloc = arm_coherent_dma_ops.arm_coherent_dma_alloc,
> .free = arm_coherent_dma_ops.arm_coherent_dma_free,
> .mmap = arm_coherent_dma_ops.arm_dma_mmap,
> .get_sgtable = arm_coherent_dma_ops.arm_dma_get_sgtable,
> .map_sg = arm_coherent_dma_ops.arm_dma_map_sg,
> .set_dma_mask = arm_coherent_dma_ops.arm_dma_set_mask,
> .map_page = armada_xp_dma_map_page,
> .unmap_page = armada_xp_dma_unmap_page,
> .unmap_sg = arm_dma_ops.unmap_sg,
> .sync_single_for_cpu = armada_xp_dma_sync,
> .sync_single_for_device = armada_xp_dma_sync,
> .sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu,
> .sync_sg_for_device = arm_dma_ops.sync_sg_for_device,
> };

No, I was thinking of making the underlying functions globally visible
and have extern declarations in a header file so you can access them
directly.

Generally speaking, when you run into a problem with common code, your
first approach should be to fix the common code before you try to work
around it.

Arnd

2012-10-24 12:27:42

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

Hello,

On Wed, 24 Oct 2012 10:04:01 +0200, Gregory CLEMENT wrote:
> Armada 370 and XP come with an unit called coherency fabric. This unit
> allows to use the Armada XP as a nearly coherent architecture. The

the Armada 370/XP

> coherency mechanism uses snoop filters to ensure the coherency between
> caches, DRAM and devices. This mechanism needs a synchronization
> barrier which guarantees that all memory write initiated by the

write -> writes

> devices has reached their target and do not reside in intermediate

has -> have

> write buffers. That's why the architecture is not totally coherent and
> we need to provide our own functions for some DMA operations.
>
> Beside the use of the coherency fabric, the device units will have to
> set the attribute flag to select the accurate coherency process for

the attribute flag of what?

> the memory transaction. This is done each device driver programs the
> DRAM address windows. The value of the attribute set by the driver is
> retrieved through the orion_addr_map_cfg struct filled during the
> early initialization of the platform.


> diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
> index 18ba60b..af22e53 100644
> --- a/arch/arm/boot/dts/armada-370-xp.dtsi
> +++ b/arch/arm/boot/dts/armada-370-xp.dtsi
> @@ -38,7 +38,8 @@
>
> coherency-fabric@d0020200 {
> compatible = "marvell,coherency-fabric";
> - reg = <0xd0020200 0xb0>;
> + reg = <0xd0020200 0xb0>,
> + <0xd0021010 0x1c>;
> };

As others mentioned, documentation update needed.

>
> soc {
> diff --git a/arch/arm/mach-mvebu/addr-map.c b/arch/arm/mach-mvebu/addr-map.c
> index fe454a4..595f6b7 100644
> --- a/arch/arm/mach-mvebu/addr-map.c
> +++ b/arch/arm/mach-mvebu/addr-map.c
> @@ -108,6 +108,9 @@ static int __init armada_setup_cpu_mbus(void)
>
> addr_map_cfg.bridge_virt_base = mbus_unit_addr_decoding_base;
>
> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
> + addr_map_cfg.hw_io_coherency = 1;
> +

I don't really like to have this of_find_compatible_node(NULL, NULL,
"marvell,coherency-fabric") test in two places, but I don't see a way
around it since armada_setup_cpu_mbus() is called as an initcall, so we
can't pass arguments to it.

> +struct dma_map_ops armada_xp_dma_ops;

static

> +static inline void armada_xp_sync_io_barrier(void)
> +{
> + writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
> + while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET) & 0x1);
> +}
> +
> +dma_addr_t armada_xp_dma_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir,
> + struct dma_attrs *attrs)
> +{
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> + return pfn_to_dma(dev, page_to_pfn(page)) + offset;
> +}
> +
> +
> +void armada_xp_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction dir,
> + struct dma_attrs *attrs)
> +{
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> +}
> +
> +void armada_xp_dma_sync(struct device *dev, dma_addr_t dma_handle,
> + size_t size, enum dma_data_direction dir)
> +{
> + if (dir != DMA_TO_DEVICE)
> + armada_xp_sync_io_barrier();
> +}

As others said, the prefix is wrong. Since the file is named coherency,
maybe just "coherency_" as the prefix? Not sure, though. Shouldn't the
file be named coherency-armada-370-xp.c, as we have done for the irq
controller file? In that case, the armada_370_xp prefix would make
sense.

> int armada_xp_get_cpu_count(void)
> {
> int reg, cnt;

static?

> @@ -74,6 +115,42 @@ int armada_370_xp_set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
> return 0;
> }
>
> +static int armada_xp_platform_notifier(struct notifier_block *nb,
> + unsigned long event, void *__dev)
> +{
> + struct device *dev = __dev;
> +
> + if (event != BUS_NOTIFY_ADD_DEVICE)
> + return NOTIFY_DONE;
> + set_dma_ops(dev, &armada_xp_dma_ops);
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block armada_xp_platform_nb = {
> + .notifier_call = armada_xp_platform_notifier,
> +};
> +
> +void __init armada_370_xp_coherency_iocache_init(void)
> +{
> + /* When the coherency fabric is available, the Armada XP and
> + * Aramada 370 are close to a coherent architecture, so we based
> + * our dma ops on the coherent one, and just changes the
> + * operations which need a arch io sync */
> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
> + memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
> + dma_ops->map_page = armada_xp_dma_map_page;
> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
> + dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
> + dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
> + }
> + bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);

As Arnd said, I would prefer a lot to have the armada_xp_dma_ops
structure be initialized statically, even though it means that if the
arm_coherent_dma_ops structure is changed, we'll have to update here as
well. But I think it's cleaner.

Also, the bus notifier should be registered only if we have the
coherency fabric, otherwise it is anyway going to be called, setting
invalid DMA operations for all the platform devices.

So:

static dma_map_ops armada_370_xp_dma_ops = {
...
};

void __init armada_370_xp_coherency_iocache_init(void)
{
if (! of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
return;

bus_register_notifier(&platform_bus_type, &armada_370_xp_platform_nb);
}

> int __init armada_370_xp_coherency_init(void)
> {
> struct device_node *np;
> @@ -82,7 +159,17 @@ int __init armada_370_xp_coherency_init(void)
> if (np) {
> pr_info("Initializing Coherency fabric\n");
> coherency_base = of_iomap(np, 0);
> + coherency_cpu_base = of_iomap(np, 1);
> + }
> +#ifndef CONFIG_SMP
> + if (coherency_base) {
> + /* In UP case, cpu coherency is enabled here, in SMP case cpu
> + * coherency is enabled for each CPU by
> + * armada_xp_smp_prepare_cpus() in platsmp.c */
> + int hw_cpuid = cpu_logical_map(smp_processor_id());
> + armada_370_xp_set_cpu_coherent(hw_cpuid, 0);
> }
> +#endif

I really don't like this part of the code. First, the test is done on
"coherency_base", while armada_370_xp_coherency_iocache_init() is
checking the existence of the DT node to find if we have a coherency
fabric or not. It should be consistent.

Then, I don't see why the code patch for SMP and UP should be
different. The code in platsmp.c:armada_xp_smp_prepare_cpus() only
enables the coherency unit for the boot CPU. The other CPUs are
enabling the mechanism in the assembly code in headsmp.S. In other
words, I think you can remove the call to
armada_370_xp_set_cpu_coherent() in armada_xp_smp_prepare_cpus(), and
make the call unconditionally here in coherency.c for the boot CPU.

It seems like there is a better way to do this.

> diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-mvebu/common.h
> index 86484bb..fff952e 100644
> --- a/arch/arm/mach-mvebu/common.h
> +++ b/arch/arm/mach-mvebu/common.h
> @@ -23,6 +23,8 @@ void armada_370_xp_handle_irq(struct pt_regs *regs);
>
> void armada_xp_cpu_die(unsigned int cpu);
>
> +void armada_370_xp_coherency_iocache_init(void);
> +
> int armada_370_xp_coherency_init(void);
> int armada_370_xp_pmsu_init(void);
> void armada_xp_secondary_startup(void);

Do we have a good reason for having
armada_370_xp_coherency_iocache_init() separate from
armada_370_xp_coherency_iocache_init() ? I.e, what prevents us from
registering the bus notifier directly in armada_370_xp_coherency_init()
if the coherency unit is available?

Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 13:56:47

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On 10/24/2012 02:24 PM, Arnd Bergmann wrote:
> On Wednesday 24 October 2012, Gregory CLEMENT wrote:
>> On 10/24/2012 01:48 PM, Gregory CLEMENT wrote:
>>> On 10/24/2012 01:36 PM, Arnd Bergmann wrote:
>>>>
>>>> I think it would be cleaner to statically define the operations in a constant
>>>> structure and point directly to the functions you need. If necessary, use
>>>> multiple structures.
>>>
>>> My problem was that these functions are not exposed, only arm_dma_op and
>>> arm_coherent_dma_ops are exported.
>>> Or do you think about something like this:
>>
>> struct dma_map_ops *dma_ops = {
>> .alloc = arm_coherent_dma_ops.arm_coherent_dma_alloc,
>> .free = arm_coherent_dma_ops.arm_coherent_dma_free,
>> .mmap = arm_coherent_dma_ops.arm_dma_mmap,
>> .get_sgtable = arm_coherent_dma_ops.arm_dma_get_sgtable,
>> .map_sg = arm_coherent_dma_ops.arm_dma_map_sg,
>> .set_dma_mask = arm_coherent_dma_ops.arm_dma_set_mask,
>> .map_page = armada_xp_dma_map_page,
>> .unmap_page = armada_xp_dma_unmap_page,
>> .unmap_sg = arm_dma_ops.unmap_sg,
>> .sync_single_for_cpu = armada_xp_dma_sync,
>> .sync_single_for_device = armada_xp_dma_sync,
>> .sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu,
>> .sync_sg_for_device = arm_dma_ops.sync_sg_for_device,
>> };
>
> No, I was thinking of making the underlying functions globally visible
> and have extern declarations in a header file so you can access them
> directly.
>
> Generally speaking, when you run into a problem with common code, your
> first approach should be to fix the common code before you try to work
> around it.

OK I thought it was done on purpose. But if you consider it needs to be
fixed I will add patch for it in next version.

>
> Arnd
>


--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 14:39:35

by Gregory CLEMENT

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On 10/24/2012 02:27 PM, Thomas Petazzoni wrote:
[...]
I will fixed the spelling and complete the comments as suggested

[...]

>> +struct dma_map_ops armada_xp_dma_ops;
>
> static

OK

>
>> +static inline void armada_xp_sync_io_barrier(void)
>> +{
>> + writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
>> + while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET) & 0x1);
>> +}
>> +
>> +dma_addr_t armada_xp_dma_map_page(struct device *dev, struct page *page,
>> + unsigned long offset, size_t size,
>> + enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + armada_xp_sync_io_barrier();
>> + return pfn_to_dma(dev, page_to_pfn(page)) + offset;
>> +}
>> +
>> +
>> +void armada_xp_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>> + size_t size, enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + armada_xp_sync_io_barrier();
>> +}
>> +
>> +void armada_xp_dma_sync(struct device *dev, dma_addr_t dma_handle,
>> + size_t size, enum dma_data_direction dir)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + armada_xp_sync_io_barrier();
>> +}
>
> As others said, the prefix is wrong. Since the file is named coherency,
> maybe just "coherency_" as the prefix? Not sure, though. Shouldn't the
> file be named coherency-armada-370-xp.c, as we have done for the irq
> controller file? In that case, the armada_370_xp prefix would make
> sense

I would prefer to just use coherency_ everywhere and keep the name of
the file as is. It made sense to suffix the irq file with
armada-370-xp because the other mvebu SoCs have their own irq
controller. Whereas, as far as I know the coherency unit is only
present in the Armada 370/XP.

>
>> int armada_xp_get_cpu_count(void)
>> {
>> int reg, cnt;
>
> static?

Which function?
armada_xp_get_cpu_count and armada_370_xp_set_cpu_coherent are exported
and moreover are not part of this patch set but the SMP one.

>
>> @@ -74,6 +115,42 @@ int armada_370_xp_set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
>> return 0;
>> }
>>
>> +static int armada_xp_platform_notifier(struct notifier_block *nb,
>> + unsigned long event, void *__dev)
>> +{
>> + struct device *dev = __dev;
>> +
>> + if (event != BUS_NOTIFY_ADD_DEVICE)
>> + return NOTIFY_DONE;
>> + set_dma_ops(dev, &armada_xp_dma_ops);
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block armada_xp_platform_nb = {
>> + .notifier_call = armada_xp_platform_notifier,
>> +};
>> +
>> +void __init armada_370_xp_coherency_iocache_init(void)
>> +{
>> + /* When the coherency fabric is available, the Armada XP and
>> + * Aramada 370 are close to a coherent architecture, so we based
>> + * our dma ops on the coherent one, and just changes the
>> + * operations which need a arch io sync */
>> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric")) {
>> + struct dma_map_ops *dma_ops = &armada_xp_dma_ops;
>> + memcpy(dma_ops, &arm_coherent_dma_ops, sizeof(*dma_ops));
>> + dma_ops->map_page = armada_xp_dma_map_page;
>> + dma_ops->unmap_page = armada_xp_dma_unmap_page;
>> + dma_ops->unmap_sg = arm_dma_ops.unmap_sg;
>> + dma_ops->sync_single_for_cpu = armada_xp_dma_sync;
>> + dma_ops->sync_single_for_device = armada_xp_dma_sync;
>> + dma_ops->sync_sg_for_cpu = arm_dma_ops.sync_sg_for_cpu;
>> + dma_ops->sync_sg_for_device = arm_dma_ops.sync_sg_for_device;
>> + }
>> + bus_register_notifier(&platform_bus_type, &armada_xp_platform_nb);
>
> As Arnd said, I would prefer a lot to have the armada_xp_dma_ops
> structure be initialized statically, even though it means that if the
> arm_coherent_dma_ops structure is changed, we'll have to update here as
> well. But I think it's cleaner.
>
> Also, the bus notifier should be registered only if we have the
> coherency fabric, otherwise it is anyway going to be called, setting
> invalid DMA operations for all the platform devices.
>
> So:
>
> static dma_map_ops armada_370_xp_dma_ops = {
> ...
> };
>
> void __init armada_370_xp_coherency_iocache_init(void)
> {
> if (! of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
> return;
>
> bus_register_notifier(&platform_bus_type, &armada_370_xp_platform_nb);
> }

OK I agree

>
>> int __init armada_370_xp_coherency_init(void)
>> {
>> struct device_node *np;
>> @@ -82,7 +159,17 @@ int __init armada_370_xp_coherency_init(void)
>> if (np) {
>> pr_info("Initializing Coherency fabric\n");
>> coherency_base = of_iomap(np, 0);
>> + coherency_cpu_base = of_iomap(np, 1);
>> + }
>> +#ifndef CONFIG_SMP
>> + if (coherency_base) {
>> + /* In UP case, cpu coherency is enabled here, in SMP case cpu
>> + * coherency is enabled for each CPU by
>> + * armada_xp_smp_prepare_cpus() in platsmp.c */
>> + int hw_cpuid = cpu_logical_map(smp_processor_id());
>> + armada_370_xp_set_cpu_coherent(hw_cpuid, 0);
>> }
>> +#endif
>
> I really don't like this part of the code. First, the test is done on
> "coherency_base", while armada_370_xp_coherency_iocache_init() is
> checking the existence of the DT node to find if we have a coherency
> fabric or not. It should be consistent.

OK

>
> Then, I don't see why the code patch for SMP and UP should be
> different. The code in platsmp.c:armada_xp_smp_prepare_cpus() only
> enables the coherency unit for the boot CPU. The other CPUs are
> enabling the mechanism in the assembly code in headsmp.S. In other
> words, I think you can remove the call to
> armada_370_xp_set_cpu_coherent() in armada_xp_smp_prepare_cpus(), and
> make the call unconditionally here in coherency.c for the boot CPU.
>
> It seems like there is a better way to do this.

Yes indeed code in headsmp.S and armada_xp_smp_prepare_cpus() are
redundant we can simplify it. I will change it in the SMP series and
for this series also.

>
>> diff --git a/arch/arm/mach-mvebu/common.h b/arch/arm/mach-mvebu/common.h
>> index 86484bb..fff952e 100644
>> --- a/arch/arm/mach-mvebu/common.h
>> +++ b/arch/arm/mach-mvebu/common.h
>> @@ -23,6 +23,8 @@ void armada_370_xp_handle_irq(struct pt_regs *regs);
>>
>> void armada_xp_cpu_die(unsigned int cpu);
>>
>> +void armada_370_xp_coherency_iocache_init(void);
>> +
>> int armada_370_xp_coherency_init(void);
>> int armada_370_xp_pmsu_init(void);
>> void armada_xp_secondary_startup(void);
>
> Do we have a good reason for having
> armada_370_xp_coherency_iocache_init() separate from
> armada_370_xp_coherency_iocache_init() ? I.e, what prevents us from

No good reason because they are the same! ;)

But more seriously, armada_370_xp_coherency_init() is called as an
early_init call whereas armada_370_xp_coherency_iocache_init() is called
later by armada_370_xp_dt_init(). I have to check if we can use
bus_register_notifier() from an early_init call or if we still need to
call armada_370_xp_coherency_init() as an early_init call.

> registering the bus notifier directly in armada_370_xp_coherency_init()
> if the coherency unit is available?

Thanks,

Gregory

2012-10-24 14:56:18

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

Dear Gregory CLEMENT,

On Wed, 24 Oct 2012 16:39:13 +0200, Gregory CLEMENT wrote:

> > As others said, the prefix is wrong. Since the file is named coherency,
> > maybe just "coherency_" as the prefix? Not sure, though. Shouldn't the
> > file be named coherency-armada-370-xp.c, as we have done for the irq
> > controller file? In that case, the armada_370_xp prefix would make
> > sense
>
> I would prefer to just use coherency_ everywhere and keep the name of
> the file as is. It made sense to suffix the irq file with
> armada-370-xp because the other mvebu SoCs have their own irq
> controller. Whereas, as far as I know the coherency unit is only
> present in the Armada 370/XP.

Well your argumentation may also be seen as an argument to name it
coherency-armada-370-xp.c :-) I don't have a strong opinion on this
though, and we can always rename it later if needed.

> >> int armada_xp_get_cpu_count(void)
> >> {
> >> int reg, cnt;
> >
> > static?
>
> Which function?
> armada_xp_get_cpu_count and armada_370_xp_set_cpu_coherent are exported
> and moreover are not part of this patch set but the SMP one.

Ok, then maybe this function shouldn't be between the DMA operations
implementation and the bus notifier callback.

> Yes indeed code in headsmp.S and armada_xp_smp_prepare_cpus() are
> redundant we can simplify it. I will change it in the SMP series and
> for this series also.

Great, thanks!

> > Do we have a good reason for having
> > armada_370_xp_coherency_iocache_init() separate from
> > armada_370_xp_coherency_iocache_init() ? I.e, what prevents us from
>
> No good reason because they are the same! ;)

Oops, indeed. but I see that you fixed the problem.

> But more seriously, armada_370_xp_coherency_init() is called as an
> early_init call whereas armada_370_xp_coherency_iocache_init() is called
> later by armada_370_xp_dt_init(). I have to check if we can use
> bus_register_notifier() from an early_init call or if we still need to
> call armada_370_xp_coherency_init() as an early_init call.

The early_initcall() is mandatory, see the comment on top of it:

+/* Coherency initialization have to be done before the SMP
+ * initialization of the CPUs*/
+early_initcall(armada_370_xp_coherency_init);

(BTW, there is a missing space between CPUs and the comment terminator).

So, my point was more: is it possible to register the bus notifier
as early as inside an early_initcall() callback.

Best regards,

Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-10-24 20:30:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 2/2] arm: mvebu: Add hardware I/O Coherency support

On Wednesday 24 October 2012, Gregory CLEMENT wrote:
> > No, I was thinking of making the underlying functions globally visible
> > and have extern declarations in a header file so you can access them
> > directly.
> >
> > Generally speaking, when you run into a problem with common code, your
> > first approach should be to fix the common code before you try to work
> > around it.
>
> OK I thought it was done on purpose. But if you consider it needs to be
> fixed I will add patch for it in next version.

As long as the functions were only used locally in one file, it's
better not to make them globally visible as a rule. But if we know
that they are needed, that should be made explicit. Please fix this
by submitting another patch to make those functions global and then
rebase this series on top of that.

Arnd