LinuxLists.cc - [PATCH 0/3] Introduce atomic MMIO register clear-set

2013-08-10 12:43:09

Subject: [PATCH 0/3] Introduce atomic MMIO register clear-set

Following a suggestion from Sebastian Hesselbarth and Russell King
here's some work to introduce a generic thread-safe clear-set register
access.

The original motivation for this comes from the need to access the same
register from a clocksource driver and a watchdog driver as we find
in Kirkwood, Armada 370/XP SoCs.

Since this sort of design is expected to appear in other platforms,
instead of exporting platform-specific {mvebu,orion}_clear_set() functions
for the thread-safe access, this patchset implements a system-wide API.

Although this is placed in arm/kernel/io.c, there's nothing
ARM-specific in the API and should probably me moved somewhere else.

Suggestions on where to put it are appreciated.

Using this API it's possible -for instance- to add support for Armada 370/XP
in the orion_wdt driver. That work is ready, and it's been hold until we
decide a proper solution for shared-register access.

Based in v3.11-rc4.

Ezequiel Garcia (3):
ARM: Introduce atomic MMIO clear/set
clocksource: orion: Use atomic access for shared registers
watchdog: orion: Use atomic access for shared registers

arch/arm/include/asm/io.h | 5 +++++
arch/arm/kernel/io.c | 24 ++++++++++++++++++++++++
drivers/clocksource/time-orion.c | 9 ++-------
drivers/watchdog/orion_wdt.c | 8 ++------
4 files changed, 33 insertions(+), 13 deletions(-)

--
1.8.1.5

2013-08-10 12:43:26

by Ezequiel Garcia

[permalink] [raw]

Subject: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

Some SoC have MMIO regions that are shared across orthogonal
subsystems. This commit implements a possible solution for the
thread-safe access of such regions through a spinlock-protected API
with clear-set semantics.

Concurrent access is protected with a single spinlock for the
entire MMIO address space. While this protects shared-registers,
it also serializes access to unrelated/unshared registers.

Signed-off-by: Ezequiel Garcia <[email protected]>
---
arch/arm/include/asm/io.h | 5 +++++
arch/arm/kernel/io.c | 24 ++++++++++++++++++++++++
2 files changed, 29 insertions(+)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index d070741..c84658d 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -36,6 +36,11 @@
#define isa_bus_to_virt phys_to_virt

/*
+ * Atomic MMIO-wide IO clear/set
+ */
+extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
+
+/*
* Generic IO read/write. These perform native-endian accesses. Note
* that some architectures will want to re-define __raw_{read,write}w.
*/
diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
index dcd5b4d..3ab8201 100644
--- a/arch/arm/kernel/io.c
+++ b/arch/arm/kernel/io.c
@@ -1,6 +1,30 @@
#include <linux/export.h>
#include <linux/types.h>
#include <linux/io.h>
+#include <linux/spinlock.h>
+
+static DEFINE_SPINLOCK(__io_lock);
+
+/*
+ * Some platforms have MMIO regions that are shared across orthogonal
+ * subsystems. This API implements thread-safe access to such regions
+ * through a spinlock-protected API with clear-set semantics.
+ *
+ * Concurrent access is protected with a single spinlock for the entire MMIO
+ * address space. While this protects shared-registers, it also serializes
+ * access to unrelated/unshared registers.
+ *
+ * Using this API on frequently accessed registers in performance-critical
+ * paths is not recommended, as the spinlock used by this API would become
+ * highly contended.
+ */
+void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
+{
+ spin_lock(&__io_lock);
+ writel((readl(reg) & ~clear) | set, reg);
+ spin_unlock(&__io_lock);
+}
+EXPORT_SYMBOL(atomic_io_clear_set);

/*
* Copy data from IO memory space to "real" memory space.
--
1.8.1.5

2013-08-10 12:43:52

by Ezequiel Garcia

[permalink] [raw]

Subject: [PATCH 2/3] clocksource: orion: Use atomic access for shared registers

Replace the driver-specific thread-safe shared register API
by the recently introduced atomic_io_clear_set().

Signed-off-by: Ezequiel Garcia <[email protected]>
---
drivers/clocksource/time-orion.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/clocksource/time-orion.c b/drivers/clocksource/time-orion.c
index ecbeb68..4334ea7 100644
--- a/drivers/clocksource/time-orion.c
+++ b/drivers/clocksource/time-orion.c
@@ -35,20 +35,15 @@
#define ORION_ONESHOT_MAX 0xfffffffe

static void __iomem *timer_base;
-static DEFINE_SPINLOCK(timer_ctrl_lock);

/*
* Thread-safe access to TIMER_CTRL register
* (shared with watchdog timer)
*/
-void orion_timer_ctrl_clrset(u32 clr, u32 set)
+static void orion_timer_ctrl_clrset(u32 clr, u32 set)
{
- spin_lock(&timer_ctrl_lock);
- writel((readl(timer_base + TIMER_CTRL) & ~clr) | set,
- timer_base + TIMER_CTRL);
- spin_unlock(&timer_ctrl_lock);
+ atomic_io_clear_set(timer_base + TIMER_CTRL, clr, set);
}
-EXPORT_SYMBOL(orion_timer_ctrl_clrset);

/*
* Free-running clocksource handling.
--
1.8.1.5

2013-08-10 12:43:54

by Ezequiel Garcia

[permalink] [raw]

Subject: [PATCH 3/3] watchdog: orion: Use atomic access for shared registers

Since the timer control register is shared with the clocksource driver,
use the recently introduced atomic_io_clear_set() to access such register.

Signed-off-by: Ezequiel Garcia <[email protected]>
---
drivers/watchdog/orion_wdt.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
index 4ea5fcc..de35ae9 100644
--- a/drivers/watchdog/orion_wdt.c
+++ b/drivers/watchdog/orion_wdt.c
@@ -73,9 +73,7 @@ static int orion_wdt_start(struct watchdog_device *wdt_dev)
writel(~WDT_INT_REQ, BRIDGE_CAUSE);

/* Enable watchdog timer */
- reg = readl(wdt_reg + TIMER_CTRL);
- reg |= WDT_EN;
- writel(reg, wdt_reg + TIMER_CTRL);
+ atomic_io_clear_set(wdt_reg + TIMER_CTRL, 0, WDT_EN);

/* Enable reset on watchdog */
reg = readl(RSTOUTn_MASK);
@@ -98,9 +96,7 @@ static int orion_wdt_stop(struct watchdog_device *wdt_dev)
writel(reg, RSTOUTn_MASK);

/* Disable watchdog timer */
- reg = readl(wdt_reg + TIMER_CTRL);
- reg &= ~WDT_EN;
- writel(reg, wdt_reg + TIMER_CTRL);
+ atomic_io_clear_set(wdt_reg + TIMER_CTRL, WDT_EN, 0);

spin_unlock(&wdt_lock);
return 0;
--
1.8.1.5

2013-08-10 12:49:59

by Alexander Shiyan

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

> Some SoC have MMIO regions that are shared across orthogonal
> subsystems. This commit implements a possible solution for the
> thread-safe access of such regions through a spinlock-protected API
> with clear-set semantics.
>
> Concurrent access is protected with a single spinlock for the
> entire MMIO address space. While this protects shared-registers,
> it also serializes access to unrelated/unshared registers.
>
> Signed-off-by: Ezequiel Garcia <[email protected]>
> ---
> arch/arm/include/asm/io.h | 5 +++++
> arch/arm/kernel/io.c | 24 ++++++++++++++++++++++++
> 2 files changed, 29 insertions(+)
>
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index d070741..c84658d 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -36,6 +36,11 @@
> #define isa_bus_to_virt phys_to_virt
>
> /*
> + * Atomic MMIO-wide IO clear/set
> + */
> +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> +
> +/*
> * Generic IO read/write. These perform native-endian accesses. Note
> * that some architectures will want to re-define __raw_{read,write}w.
> */
> diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> index dcd5b4d..3ab8201 100644
> --- a/arch/arm/kernel/io.c
> +++ b/arch/arm/kernel/io.c
> @@ -1,6 +1,30 @@
> #include <linux/export.h>
> #include <linux/types.h>
> #include <linux/io.h>
> +#include <linux/spinlock.h>
> +
> +static DEFINE_SPINLOCK(__io_lock);
> +
> +/*
> + * Some platforms have MMIO regions that are shared across orthogonal
> + * subsystems. This API implements thread-safe access to such regions
> + * through a spinlock-protected API with clear-set semantics.
> + *
> + * Concurrent access is protected with a single spinlock for the entire MMIO
> + * address space. While this protects shared-registers, it also serializes
> + * access to unrelated/unshared registers.
> + *
> + * Using this API on frequently accessed registers in performance-critical
> + * paths is not recommended, as the spinlock used by this API would become
> + * highly contended.
> + */
> +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> +{
> + spin_lock(&__io_lock);
> + writel((readl(reg) & ~clear) | set, reg);
> + spin_unlock(&__io_lock);
> +}
> +EXPORT_SYMBOL(atomic_io_clear_set);

So, one lock is used to all possible registers?
Seems a regmap-mmio can be used for such access.

---
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2013-08-10 14:02:44

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > Some SoC have MMIO regions that are shared across orthogonal
> > subsystems. This commit implements a possible solution for the
> > thread-safe access of such regions through a spinlock-protected API
> > with clear-set semantics.
> >
> > Concurrent access is protected with a single spinlock for the
> > entire MMIO address space. While this protects shared-registers,
> > it also serializes access to unrelated/unshared registers.
> >
> > Signed-off-by: Ezequiel Garcia <[email protected]>
> > ---
> > arch/arm/include/asm/io.h | 5 +++++
> > arch/arm/kernel/io.c | 24 ++++++++++++++++++++++++
> > 2 files changed, 29 insertions(+)
> >
> > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > index d070741..c84658d 100644
> > --- a/arch/arm/include/asm/io.h
> > +++ b/arch/arm/include/asm/io.h
> > @@ -36,6 +36,11 @@
> > #define isa_bus_to_virt phys_to_virt
> >
> > /*
> > + * Atomic MMIO-wide IO clear/set
> > + */
> > +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> > +
> > +/*
> > * Generic IO read/write. These perform native-endian accesses. Note
> > * that some architectures will want to re-define __raw_{read,write}w.
> > */
> > diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> > index dcd5b4d..3ab8201 100644
> > --- a/arch/arm/kernel/io.c
> > +++ b/arch/arm/kernel/io.c
> > @@ -1,6 +1,30 @@
> > #include <linux/export.h>
> > #include <linux/types.h>
> > #include <linux/io.h>
> > +#include <linux/spinlock.h>
> > +
> > +static DEFINE_SPINLOCK(__io_lock);
> > +
> > +/*
> > + * Some platforms have MMIO regions that are shared across orthogonal
> > + * subsystems. This API implements thread-safe access to such regions
> > + * through a spinlock-protected API with clear-set semantics.
> > + *
> > + * Concurrent access is protected with a single spinlock for the entire MMIO
> > + * address space. While this protects shared-registers, it also serializes
> > + * access to unrelated/unshared registers.
> > + *
> > + * Using this API on frequently accessed registers in performance-critical
> > + * paths is not recommended, as the spinlock used by this API would become
> > + * highly contended.
> > + */
> > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > +{
> > + spin_lock(&__io_lock);
> > + writel((readl(reg) & ~clear) | set, reg);
> > + spin_unlock(&__io_lock);
> > +}
> > +EXPORT_SYMBOL(atomic_io_clear_set);
>
> So, one lock is used to all possible registers?
> Seems a regmap-mmio can be used for such access.
>

Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.

However, after reading some code, I fail to see how that helps in this case.

Note that we need to access the *same* MMIO address from completely
different (and unrelated) drivers, such as watchdog and clocksource.

So I wonder who would "own" the regmap descriptor, and how does the other
one gets aware of that descriptor?

In addition given we can use orion_wdt (originally meant for mach-kirkwood)
to support mvebu SoC watchdog, we need to sort this out in a completely
multiplatform capable way.

Ideas?
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-10 14:09:33

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > Some SoC have MMIO regions that are shared across orthogonal
> > > subsystems. This commit implements a possible solution for the
> > > thread-safe access of such regions through a spinlock-protected API
> > > with clear-set semantics.
> > >
> > > Concurrent access is protected with a single spinlock for the
> > > entire MMIO address space. While this protects shared-registers,
> > > it also serializes access to unrelated/unshared registers.
> > >
> > > Signed-off-by: Ezequiel Garcia <[email protected]>
> > > ---
> > > arch/arm/include/asm/io.h | 5 +++++
> > > arch/arm/kernel/io.c | 24 ++++++++++++++++++++++++
> > > 2 files changed, 29 insertions(+)
> > >
> > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > > index d070741..c84658d 100644
> > > --- a/arch/arm/include/asm/io.h
> > > +++ b/arch/arm/include/asm/io.h
> > > @@ -36,6 +36,11 @@
> > > #define isa_bus_to_virt phys_to_virt
> > >
> > > /*
> > > + * Atomic MMIO-wide IO clear/set
> > > + */
> > > +extern void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set);
> > > +
> > > +/*
> > > * Generic IO read/write. These perform native-endian accesses. Note
> > > * that some architectures will want to re-define __raw_{read,write}w.
> > > */
> > > diff --git a/arch/arm/kernel/io.c b/arch/arm/kernel/io.c
> > > index dcd5b4d..3ab8201 100644
> > > --- a/arch/arm/kernel/io.c
> > > +++ b/arch/arm/kernel/io.c
> > > @@ -1,6 +1,30 @@
> > > #include <linux/export.h>
> > > #include <linux/types.h>
> > > #include <linux/io.h>
> > > +#include <linux/spinlock.h>
> > > +
> > > +static DEFINE_SPINLOCK(__io_lock);
> > > +
> > > +/*
> > > + * Some platforms have MMIO regions that are shared across orthogonal
> > > + * subsystems. This API implements thread-safe access to such regions
> > > + * through a spinlock-protected API with clear-set semantics.
> > > + *
> > > + * Concurrent access is protected with a single spinlock for the entire MMIO
> > > + * address space. While this protects shared-registers, it also serializes
> > > + * access to unrelated/unshared registers.
> > > + *
> > > + * Using this API on frequently accessed registers in performance-critical
> > > + * paths is not recommended, as the spinlock used by this API would become
> > > + * highly contended.
> > > + */
> > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > +{
> > > + spin_lock(&__io_lock);
> > > + writel((readl(reg) & ~clear) | set, reg);
> > > + spin_unlock(&__io_lock);
> > > +}
> > > +EXPORT_SYMBOL(atomic_io_clear_set);
> >
> > So, one lock is used to all possible registers?
> > Seems a regmap-mmio can be used for such access.
> >
>
> Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
>
> However, after reading some code, I fail to see how that helps in this case.
>
> Note that we need to access the *same* MMIO address from completely
> different (and unrelated) drivers, such as watchdog and clocksource.
>
> So I wonder who would "own" the regmap descriptor, and how does the other
> one gets aware of that descriptor?
>
> In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> to support mvebu SoC watchdog, we need to sort this out in a completely
> multiplatform capable way.
>
> Ideas?

Answering myself...

How about using drivers/mfd/syscon.c to create the regmap owner for the shared
register (TIMER_CTRL in this case, but others might appear) ?

Or adding a new mfd implementation if syscon does not fit ?

Does this sound like an overkill ?

--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-10 15:43:42

by Alexander Shiyan

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

> On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > subsystems. This commit implements a possible solution for the
> > > > thread-safe access of such regions through a spinlock-protected API
> > > > with clear-set semantics.
> > > >
> > > > Concurrent access is protected with a single spinlock for the
> > > > entire MMIO address space. While this protects shared-registers,
> > > > it also serializes access to unrelated/unshared registers.
[...]
> > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > +{
> > > > + spin_lock(&__io_lock);
> > > > + writel((readl(reg) & ~clear) | set, reg);
> > > > + spin_unlock(&__io_lock);
> > > > +}
> > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > >
> > > So, one lock is used to all possible registers?
> > > Seems a regmap-mmio can be used for such access.
> > >
> >
> > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> >
> > However, after reading some code, I fail to see how that helps in this case.
> >
> > Note that we need to access the *same* MMIO address from completely
> > different (and unrelated) drivers, such as watchdog and clocksource.
> >
> > So I wonder who would "own" the regmap descriptor, and how does the other
> > one gets aware of that descriptor?
> >
> > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > to support mvebu SoC watchdog, we need to sort this out in a completely
> > multiplatform capable way.
> >
> > Ideas?
>
> Answering myself...
>
> How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> register (TIMER_CTRL in this case, but others might appear) ?
>
> Or adding a new mfd implementation if syscon does not fit ?
>
> Does this sound like an overkill ?

Yes, syscon is designed especially for such cases.

---
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2013-08-10 15:55:59

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Sat, Aug 10, 2013 at 07:43:08PM +0400, Alexander Shiyan wrote:
> > On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > > subsystems. This commit implements a possible solution for the
> > > > > thread-safe access of such regions through a spinlock-protected API
> > > > > with clear-set semantics.
> > > > >
> > > > > Concurrent access is protected with a single spinlock for the
> > > > > entire MMIO address space. While this protects shared-registers,
> > > > > it also serializes access to unrelated/unshared registers.
> [...]
> > > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > > +{
> > > > > + spin_lock(&__io_lock);
> > > > > + writel((readl(reg) & ~clear) | set, reg);
> > > > > + spin_unlock(&__io_lock);
> > > > > +}
> > > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > > >
> > > > So, one lock is used to all possible registers?
> > > > Seems a regmap-mmio can be used for such access.
> > > >
> > >
> > > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> > >
> > > However, after reading some code, I fail to see how that helps in this case.
> > >
> > > Note that we need to access the *same* MMIO address from completely
> > > different (and unrelated) drivers, such as watchdog and clocksource.
> > >
> > > So I wonder who would "own" the regmap descriptor, and how does the other
> > > one gets aware of that descriptor?
> > >
> > > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > > to support mvebu SoC watchdog, we need to sort this out in a completely
> > > multiplatform capable way.
> > >
> > > Ideas?
> >
> > Answering myself...
> >
> > How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> > register (TIMER_CTRL in this case, but others might appear) ?
> >
> > Or adding a new mfd implementation if syscon does not fit ?
> >
> > Does this sound like an overkill ?
>
> Yes, syscon is designed especially for such cases.
>

Indeed, syscon looks like a nice match for this use case.
(although it still looks like an overkill to me).

I've been trying to implement a working solution based in syscon but I'm
unable to overcome an issue.

The problem is that we need the register/regmap to initialize the clocksource
driver for this machine (aka the timer). Of course, this happens at a
*very* early point, way before the syscon driver is available... :-(

Maybe someone has an idea?
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-12 15:46:52

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Sat, Aug 10, 2013 at 12:55:53PM -0300, Ezequiel Garcia wrote:
> On Sat, Aug 10, 2013 at 07:43:08PM +0400, Alexander Shiyan wrote:
> > > On Sat, Aug 10, 2013 at 11:02:38AM -0300, Ezequiel Garcia wrote:
> > > > On Sat, Aug 10, 2013 at 04:49:28PM +0400, Alexander Shiyan wrote:
> > > > > > Some SoC have MMIO regions that are shared across orthogonal
> > > > > > subsystems. This commit implements a possible solution for the
> > > > > > thread-safe access of such regions through a spinlock-protected API
> > > > > > with clear-set semantics.
> > > > > >
> > > > > > Concurrent access is protected with a single spinlock for the
> > > > > > entire MMIO address space. While this protects shared-registers,
> > > > > > it also serializes access to unrelated/unshared registers.
> > [...]
> > > > > > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > > > > > +{
> > > > > > + spin_lock(&__io_lock);
> > > > > > + writel((readl(reg) & ~clear) | set, reg);
> > > > > > + spin_unlock(&__io_lock);
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(atomic_io_clear_set);
> > > > >
> > > > > So, one lock is used to all possible registers?
> > > > > Seems a regmap-mmio can be used for such access.
> > > > >
> > > >
> > > > Thanks for the hint! Quite frankly, I wasn't familiar with regmap-mmio.
> > > >
> > > > However, after reading some code, I fail to see how that helps in this case.
> > > >
> > > > Note that we need to access the *same* MMIO address from completely
> > > > different (and unrelated) drivers, such as watchdog and clocksource.
> > > >
> > > > So I wonder who would "own" the regmap descriptor, and how does the other
> > > > one gets aware of that descriptor?
> > > >
> > > > In addition given we can use orion_wdt (originally meant for mach-kirkwood)
> > > > to support mvebu SoC watchdog, we need to sort this out in a completely
> > > > multiplatform capable way.
> > > >
> > > > Ideas?
> > >
> > > Answering myself...
> > >
> > > How about using drivers/mfd/syscon.c to create the regmap owner for the shared
> > > register (TIMER_CTRL in this case, but others might appear) ?
> > >
> > > Or adding a new mfd implementation if syscon does not fit ?
> > >
> > > Does this sound like an overkill ?
> >
> > Yes, syscon is designed especially for such cases.
> >
>
> Indeed, syscon looks like a nice match for this use case.
> (although it still looks like an overkill to me).
>
> I've been trying to implement a working solution based in syscon but I'm
> unable to overcome an issue.
>
> The problem is that we need the register/regmap to initialize the clocksource
> driver for this machine (aka the timer). Of course, this happens at a
> *very* early point, way before the syscon driver is available... :-(
>
> Maybe someone has an idea?

Sebastian, Russell: I can't find the previous mail where you proposed
this solution to address the shared register issue between Kirkwood's
watchdog and clocksource.

Is this what you had in mind?

Do you think trying to use a regmap could be better (given we can
sort out the problem explained above)?
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-12 16:44:17

by Sebastian Hesselbarth

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On 08/12/13 17:46, Ezequiel Garcia wrote:
>> Indeed, syscon looks like a nice match for this use case.
>> (although it still looks like an overkill to me).
>>
>> I've been trying to implement a working solution based in syscon but I'm
>> unable to overcome an issue.
>>
>> The problem is that we need the register/regmap to initialize the clocksource
>> driver for this machine (aka the timer). Of course, this happens at a
>> *very* early point, way before the syscon driver is available... :-(
>>
>> Maybe someone has an idea?
>
> Sebastian, Russell: I can't find the previous mail where you proposed
> this solution to address the shared register issue between Kirkwood's
> watchdog and clocksource.

Russell first mentioned an atomic modify function here:
http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html

Linus Walleij already suggested mfd/syscon as a container
for protected register accesses later.

> Is this what you had in mind?

The pro of a generic atomic clear/set is that we can use it
very early, on all platforms, and from totally unrelated
drivers. As you already mentioned, using syscon from timers will
get us into into trouble, because it has not been registered.

> Do you think trying to use a regmap could be better (given we can
> sort out the problem explained above)?

Given the small number of registers we need to protect and especially
for using it in timers, I'd prefer your proposal. Otherwise, I guess,
we would have to mimic mfd/syscon for time-orion and time-armada-370-xp
and make wdt-orion depend on it. I doubt we can make any use of
mfd/syscon for the timer use case.

Sebastian

2013-08-12 17:09:09

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

Sebastian,

On Mon, Aug 12, 2013 at 06:44:10PM +0200, Sebastian Hesselbarth wrote:
> On 08/12/13 17:46, Ezequiel Garcia wrote:
> >> Indeed, syscon looks like a nice match for this use case.
> >> (although it still looks like an overkill to me).
> >>
> >> I've been trying to implement a working solution based in syscon but I'm
> >> unable to overcome an issue.
> >>
> >> The problem is that we need the register/regmap to initialize the clocksource
> >> driver for this machine (aka the timer). Of course, this happens at a
> >> *very* early point, way before the syscon driver is available... :-(
> >>
> >> Maybe someone has an idea?
> >
> > Sebastian, Russell: I can't find the previous mail where you proposed
> > this solution to address the shared register issue between Kirkwood's
> > watchdog and clocksource.
>
> Russell first mentioned an atomic modify function here:
> http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html
>

Thanks a lot for finding this thread. I see we all just went through the
same line of reasoning.

>
> The pro of a generic atomic clear/set is that we can use it
> very early, on all platforms, and from totally unrelated
> drivers. As you already mentioned, using syscon from timers will
> get us into into trouble, because it has not been registered.
>

Yes, indeed.

> > Do you think trying to use a regmap could be better (given we can
> > sort out the problem explained above)?
>
> Given the small number of registers we need to protect and especially
> for using it in timers, I'd prefer your proposal. Otherwise, I guess,
> we would have to mimic mfd/syscon for time-orion and time-armada-370-xp
> and make wdt-orion depend on it. I doubt we can make any use of
> mfd/syscon for the timer use case.
>

Then I think we all agree here. Just to confirm:

* The proposed API is almost exactly the one proposed by Russell
in the mail you just mentioned:
http://archive.arm.linux.org.uk/lurker/message/20130618.113606.d7d4fe4b.en.html

* Linus Walleij suggested mfd/syscon, but Russell, Mark and Linus
itself seem to agree it's more heavy-weight than necessary.
http://archive.arm.linux.org.uk/lurker/message/20130618.151116.712407e0.en.html
http://archive.arm.linux.org.uk/lurker/message/20130618.183359.a6184b7f.en.html
http://archive.arm.linux.org.uk/lurker/message/20130618.152300.bffa038f.en.html

The only open question is: given there's nothing arch-dependent in this
mechanism, should we keep this in arch/arm/kernel? And if not, where
should we move this?
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-12 18:30:23

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Sat, Aug 10, 2013 at 01:43:00PM +0100, Ezequiel Garcia wrote:
> Some SoC have MMIO regions that are shared across orthogonal
> subsystems. This commit implements a possible solution for the
> thread-safe access of such regions through a spinlock-protected API
> with clear-set semantics.
>
> Concurrent access is protected with a single spinlock for the
> entire MMIO address space. While this protects shared-registers,
> it also serializes access to unrelated/unshared registers.
>
> Signed-off-by: Ezequiel Garcia <[email protected]>

[...]

> +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> +{
> + spin_lock(&__io_lock);
> + writel((readl(reg) & ~clear) | set, reg);
> + spin_unlock(&__io_lock);
> +}

I appreciate that you've lifted this code from a previous driver, but this
doesn't really make any sense to me. The spin_unlock operation is
essentially a store to normal, cacheable memory, whilst the writel is an
__iowmb followed by a store to device memory.

This means that you don't have ordering guarantees between the two accesses
outside of the CPU, potentially giving you:

spin_lock(&__io_lock);
spin_unlock(&__io_lock);
writel((readl(reg) & ~clear) | set, reg);

which is probably not what you want.

I suggest adding an iowmb after the writel if you really need this ordering
to be enforced (but this may have a significant performance impact,
depending on your SoC).

Will

2013-08-19 16:59:58

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> On Sat, Aug 10, 2013 at 01:43:00PM +0100, Ezequiel Garcia wrote:
> > Some SoC have MMIO regions that are shared across orthogonal
> > subsystems. This commit implements a possible solution for the
> > thread-safe access of such regions through a spinlock-protected API
> > with clear-set semantics.
> >
> > Concurrent access is protected with a single spinlock for the
> > entire MMIO address space. While this protects shared-registers,
> > it also serializes access to unrelated/unshared registers.
> >
> > Signed-off-by: Ezequiel Garcia <[email protected]>
>
> [...]
>
> > +void atomic_io_clear_set(void __iomem *reg, u32 clear, u32 set)
> > +{
> > + spin_lock(&__io_lock);
> > + writel((readl(reg) & ~clear) | set, reg);
> > + spin_unlock(&__io_lock);
> > +}
>
> I appreciate that you've lifted this code from a previous driver, but this
> doesn't really make any sense to me. The spin_unlock operation is
> essentially a store to normal, cacheable memory, whilst the writel is an
> __iowmb followed by a store to device memory.
>
> This means that you don't have ordering guarantees between the two accesses
> outside of the CPU, potentially giving you:
>
> spin_lock(&__io_lock);
> spin_unlock(&__io_lock);
> writel((readl(reg) & ~clear) | set, reg);
>
> which is probably not what you want.
>
> I suggest adding an iowmb after the writel if you really need this ordering
> to be enforced (but this may have a significant performance impact,
> depending on your SoC).
>

I don't want to argue with you, given I have zero knowledge about this
ordering issue. However let me ask you a question.

In arch/arm/include/asm/spinlock.h I'm seeing this comment:

""ARMv6 ticket-based spin-locking.
A memory barrier is required after we get a lock, and before we
release it, because V6 CPUs are assumed to have weakly ordered
memory.""

and also:

static inline void arch_spin_unlock(arch_spinlock_t *lock)
{
smp_mb();
lock->tickets.owner++;
dsb_sev();
}

So, knowing this atomic API should work for every ARMv{N}, and not being very
sure what the call to dsb_sev() does. Would you care to explain how the above
is *not* enough to guarantee a memory barrier before the spin unlocking?

Thanks!
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-20 14:32:15

by Matt Sealey

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
<[email protected]> wrote:
> On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
>
>> This means that you don't have ordering guarantees between the two accesses
>> outside of the CPU, potentially giving you:
>>
>> spin_lock(&__io_lock);
>> spin_unlock(&__io_lock);
>> writel((readl(reg) & ~clear) | set, reg);
>>
>> which is probably not what you want.
>>
>> I suggest adding an iowmb after the writel if you really need this ordering
>> to be enforced (but this may have a significant performance impact,
>> depending on your SoC).
>
> I don't want to argue with you, given I have zero knowledge about this
> ordering issue. However let me ask you a question.
>
> In arch/arm/include/asm/spinlock.h I'm seeing this comment:
>
> ""ARMv6 ticket-based spin-locking.
> A memory barrier is required after we get a lock, and before we
> release it, because V6 CPUs are assumed to have weakly ordered
> memory.""
>
> and also:
>
> static inline void arch_spin_unlock(arch_spinlock_t *lock)
> {
> smp_mb();
> lock->tickets.owner++;
> dsb_sev();
> }
>
> So, knowing this atomic API should work for every ARMv{N}, and not being very
> sure what the call to dsb_sev() does. Would you care to explain how the above
> is *not* enough to guarantee a memory barrier before the spin unlocking?

arch_spin_[un]lock as an API is not guaranteed to use a barrier before
or after doing anything, even if this particular implementation does.

dsb_sev() is an SMP helper which does a synchronization barrier and
then sends events to other CPUs which informs them of the unlock. If
the other CPUs were in WFE state waiting on that spinlock, they can
now thunder in and attempt to lock it themselves. It's not really
relevant to the discussion as arch_spin_unlock is not guaranteed to
perform a barrier before returning.

On some other architecture there may be ISA additions which make
locking barrier-less, or on a specific implementation of an ARM
architecture SoC whereby there may be a bank of "hardware spinlocks"
available.

So, in this sense, you shouldn't rely on implementation-specific
behaviors of a function. If you need to be sure C follows B follows A,
insert a barrier yourself. Don't expect A to barrier for you just
because you saw some source code that does it today, as tomorrow it
may be different. It's not an optimization, just a potential source of
new bugs if the implementation changes underneath you later.

--
Matt

2013-08-20 14:52:03

by Ezequiel Garcia

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Tue, Aug 20, 2013 at 09:32:13AM -0500, Matt Sealey wrote:
> On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
> <[email protected]> wrote:
> > On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> >
> >> This means that you don't have ordering guarantees between the two accesses
> >> outside of the CPU, potentially giving you:
> >>
> >> spin_lock(&__io_lock);
> >> spin_unlock(&__io_lock);
> >> writel((readl(reg) & ~clear) | set, reg);
> >>
> >> which is probably not what you want.
> >>
> >> I suggest adding an iowmb after the writel if you really need this ordering
> >> to be enforced (but this may have a significant performance impact,
> >> depending on your SoC).
> >
> > I don't want to argue with you, given I have zero knowledge about this
> > ordering issue. However let me ask you a question.
> >
> > In arch/arm/include/asm/spinlock.h I'm seeing this comment:
> >
> > ""ARMv6 ticket-based spin-locking.
> > A memory barrier is required after we get a lock, and before we
> > release it, because V6 CPUs are assumed to have weakly ordered
> > memory.""
> >
> > and also:
> >
> > static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > {
> > smp_mb();
> > lock->tickets.owner++;
> > dsb_sev();
> > }
> >
> > So, knowing this atomic API should work for every ARMv{N}, and not being very
> > sure what the call to dsb_sev() does. Would you care to explain how the above
> > is *not* enough to guarantee a memory barrier before the spin unlocking?
>
> arch_spin_[un]lock as an API is not guaranteed to use a barrier before
> or after doing anything, even if this particular implementation does.
>
> dsb_sev() is an SMP helper which does a synchronization barrier and
> then sends events to other CPUs which informs them of the unlock. If
> the other CPUs were in WFE state waiting on that spinlock, they can
> now thunder in and attempt to lock it themselves. It's not really
> relevant to the discussion as arch_spin_unlock is not guaranteed to
> perform a barrier before returning.
>
> On some other architecture there may be ISA additions which make
> locking barrier-less, or on a specific implementation of an ARM
> architecture SoC whereby there may be a bank of "hardware spinlocks"
> available.
>
> So, in this sense, you shouldn't rely on implementation-specific
> behaviors of a function. If you need to be sure C follows B follows A,
> insert a barrier yourself. Don't expect A to barrier for you just
> because you saw some source code that does it today, as tomorrow it
> may be different. It's not an optimization, just a potential source of
> new bugs if the implementation changes underneath you later.
>

Of course. I agree completely.

Thanks a lot,
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

2013-08-20 15:05:13

by Will Deacon

[permalink] [raw]

Subject: Re: [PATCH 1/3] ARM: Introduce atomic MMIO clear/set

On Tue, Aug 20, 2013 at 03:52:00PM +0100, Ezequiel Garcia wrote:
> On Tue, Aug 20, 2013 at 09:32:13AM -0500, Matt Sealey wrote:
> > On Mon, Aug 19, 2013 at 11:59 AM, Ezequiel Garcia
> > <[email protected]> wrote:
> > > On Mon, Aug 12, 2013 at 07:29:42PM +0100, Will Deacon wrote:
> > >> I suggest adding an iowmb after the writel if you really need this ordering
> > >> to be enforced (but this may have a significant performance impact,
> > >> depending on your SoC).
> > >
> > > I don't want to argue with you, given I have zero knowledge about this
> > > ordering issue. However let me ask you a question.
> > >
> > > In arch/arm/include/asm/spinlock.h I'm seeing this comment:
> > >
> > > ""ARMv6 ticket-based spin-locking.
> > > A memory barrier is required after we get a lock, and before we
> > > release it, because V6 CPUs are assumed to have weakly ordered
> > > memory.""
> > >
> > > and also:
> > >
> > > static inline void arch_spin_unlock(arch_spinlock_t *lock)
> > > {
> > > smp_mb();
> > > lock->tickets.owner++;
> > > dsb_sev();
> > > }
> > >
> > > So, knowing this atomic API should work for every ARMv{N}, and not being very
> > > sure what the call to dsb_sev() does. Would you care to explain how the above
> > > is *not* enough to guarantee a memory barrier before the spin unlocking?
> >
> > arch_spin_[un]lock as an API is not guaranteed to use a barrier before
> > or after doing anything, even if this particular implementation does.

[...]

> Of course. I agree completely.

Well, even if the barrier was guaranteed by the API, it's still not
sufficient to ensure ordering between two different memory types. For
example, on Cortex-A9 with PL310, you would also need to perform an
outer_sync() operation before the unlock.

Will