2013-04-10 03:08:40

by Rob Herring

[permalink] [raw]
Subject: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

From: Rob Herring <[email protected]>

Atomic operations are undefined behavior on ARM for device or strongly
ordered memory types. So use write-combine variants for mappings. This
corresponds to normal, non-cacheable memory on ARM. For many other
architectures, this change should not change the mapping type.

Signed-off-by: Rob Herring <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
---
fs/pstore/ram_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 0306303..e126d9f 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -337,7 +337,7 @@ static void *persistent_ram_vmap(phys_addr_t start, size_t size)
page_start = start - offset_in_page(start);
page_count = DIV_ROUND_UP(size + offset_in_page(start), PAGE_SIZE);

- prot = pgprot_noncached(PAGE_KERNEL);
+ prot = pgprot_writecombine(PAGE_KERNEL);

pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL);
if (!pages) {
@@ -364,7 +364,7 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size)
return NULL;
}

- return ioremap(start, size);
+ return ioremap_wc(start, size);
}

static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size,
--
1.7.10.4


2013-04-10 03:08:44

by Rob Herring

[permalink] [raw]
Subject: [RFC PATCH 2/3] pstore ram: remove the power of buffer size limitation

From: Rob Herring <[email protected]>

There doesn't appear to be any reason for the overall pstore RAM buffer to
be a power of 2 size, so remove it. The individual console, ftrace and oops
buffers are still a power of 2 size.

Signed-off-by: Rob Herring <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
---
fs/pstore/ram.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 288f068..f980077 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -391,8 +391,6 @@ static int ramoops_probe(struct platform_device *pdev)
goto fail_out;
}

- if (!is_power_of_2(pdata->mem_size))
- pdata->mem_size = rounddown_pow_of_two(pdata->mem_size);
if (!is_power_of_2(pdata->record_size))
pdata->record_size = rounddown_pow_of_two(pdata->record_size);
if (!is_power_of_2(pdata->console_size))
--
1.7.10.4

2013-04-10 03:08:58

by Rob Herring

[permalink] [raw]
Subject: [RFC PATCH 3/3] pstore/ram: avoid atomic accesses for ioremapped regions

From: Rob Herring <[email protected]>

For persistent RAM outside of main memory, the memory may have limitations
on supported accesses. For internal RAM on highbank platform exclusive
accesses are not supported and will hang the system. So atomic_cmpxchg
cannot be used. This commit uses spinlock protection for buffer size and
start updates on ioremapped regions instead.

Signed-off-by: Rob Herring <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: [email protected]
---
fs/pstore/ram_core.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index e126d9f..97e640b 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -46,7 +46,7 @@ static inline size_t buffer_start(struct persistent_ram_zone *prz)
}

/* increase and wrap the start pointer, returning the old value */
-static inline size_t buffer_start_add(struct persistent_ram_zone *prz, size_t a)
+static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t a)
{
int old;
int new;
@@ -62,7 +62,7 @@ static inline size_t buffer_start_add(struct persistent_ram_zone *prz, size_t a)
}

/* increase the size counter until it hits the max size */
-static inline void buffer_size_add(struct persistent_ram_zone *prz, size_t a)
+static void buffer_size_add_atomic(struct persistent_ram_zone *prz, size_t a)
{
size_t old;
size_t new;
@@ -78,6 +78,53 @@ static inline void buffer_size_add(struct persistent_ram_zone *prz, size_t a)
} while (atomic_cmpxchg(&prz->buffer->size, old, new) != old);
}

+static DEFINE_RAW_SPINLOCK(buffer_lock);
+
+/* increase and wrap the start pointer, returning the old value */
+static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t a)
+{
+ int old;
+ int new;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&buffer_lock, flags);
+
+ old = atomic_read(&prz->buffer->start);
+ new = old + a;
+ while (unlikely(new > prz->buffer_size))
+ new -= prz->buffer_size;
+ atomic_set(&prz->buffer->start, new);
+
+ raw_spin_unlock_irqrestore(&buffer_lock, flags);
+
+ return old;
+}
+
+/* increase the size counter until it hits the max size */
+static void buffer_size_add_locked(struct persistent_ram_zone *prz, size_t a)
+{
+ size_t old;
+ size_t new;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&buffer_lock, flags);
+
+ old = atomic_read(&prz->buffer->size);
+ if (old == prz->buffer_size)
+ goto exit;
+
+ new = old + a;
+ if (new > prz->buffer_size)
+ new = prz->buffer_size;
+ atomic_set(&prz->buffer->size, new);
+
+exit:
+ raw_spin_unlock_irqrestore(&buffer_lock, flags);
+}
+
+static size_t (*buffer_start_add)(struct persistent_ram_zone *, size_t) = buffer_start_add_atomic;
+static void (*buffer_size_add)(struct persistent_ram_zone *, size_t) = buffer_size_add_atomic;
+
static void notrace persistent_ram_encode_rs8(struct persistent_ram_zone *prz,
uint8_t *data, size_t len, uint8_t *ecc)
{
@@ -364,6 +411,9 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size)
return NULL;
}

+ buffer_start_add = buffer_start_add_locked;
+ buffer_size_add = buffer_size_add_locked;
+
return ioremap_wc(start, size);
}

--
1.7.10.4

2013-04-10 03:53:20

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
> From: Rob Herring <[email protected]>
>
> Atomic operations are undefined behavior on ARM for device or strongly
> ordered memory types. So use write-combine variants for mappings. This
> corresponds to normal, non-cacheable memory on ARM. For many other
> architectures, this change should not change the mapping type.

This is going to make ramconsole less reliable. A debugging printk
followed by a __raw_writel that causes an immediate hard crash is
likely to lose the last updates, including the most useful message, in
the write buffers.

Also, isn't this patch unnecessary after patch 3 in this set?

> Signed-off-by: Rob Herring <[email protected]>
> Cc: Anton Vorontsov <[email protected]>
> Cc: Colin Cross <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Tony Luck <[email protected]>
> Cc: [email protected]
> ---
> fs/pstore/ram_core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
> index 0306303..e126d9f 100644
> --- a/fs/pstore/ram_core.c
> +++ b/fs/pstore/ram_core.c
> @@ -337,7 +337,7 @@ static void *persistent_ram_vmap(phys_addr_t start, size_t size)
> page_start = start - offset_in_page(start);
> page_count = DIV_ROUND_UP(size + offset_in_page(start), PAGE_SIZE);
>
> - prot = pgprot_noncached(PAGE_KERNEL);
> + prot = pgprot_writecombine(PAGE_KERNEL);
Is this necessary? Won't pgprot_noncached already be normal memory?

> pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL);
> if (!pages) {
> @@ -364,7 +364,7 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size)
> return NULL;
> }
>
> - return ioremap(start, size);
> + return ioremap_wc(start, size);

ioremap_wc corresponds to MT_DEVICE_WC, which is still device memory,
so I don't see how this helps solve the problem in the commit message.

> }
>
> static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size,
> --
> 1.7.10.4
>

2013-04-10 04:10:40

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] pstore/ram: avoid atomic accesses for ioremapped regions

On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
> From: Rob Herring <[email protected]>
>
> For persistent RAM outside of main memory, the memory may have limitations
> on supported accesses. For internal RAM on highbank platform exclusive
> accesses are not supported and will hang the system. So atomic_cmpxchg
> cannot be used. This commit uses spinlock protection for buffer size and
> start updates on ioremapped regions instead.

I used atomics in persistent_ram to support persistent ftrace, which
now exists as PSTORE_FTRACE. At some point during development I had
trouble with recursive tracing causing an infinite loop, so you may
want to test that calling out to spinlock functions with PSTORE_FTRACE
turned on and enabled doesn't cause a problem.

2013-04-10 13:31:03

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On 04/09/2013 10:53 PM, Colin Cross wrote:
> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
>> From: Rob Herring <[email protected]>
>>
>> Atomic operations are undefined behavior on ARM for device or strongly
>> ordered memory types. So use write-combine variants for mappings. This
>> corresponds to normal, non-cacheable memory on ARM. For many other
>> architectures, this change should not change the mapping type.
>
> This is going to make ramconsole less reliable. A debugging printk
> followed by a __raw_writel that causes an immediate hard crash is
> likely to lose the last updates, including the most useful message, in
> the write buffers.

It would have to be a write that hangs the bus. In my experience with
AXI, the bus doesn't actually hang until you hit max outstanding
transactions.

I think exclusive stores will limit the buffering, but that is probably
not architecturally guaranteed.

I could put a wb() in at the end of persistent_ram_write.

> Also, isn't this patch unnecessary after patch 3 in this set?

It is still needed in the main memory case to be architecturally correct
to avoid multiple mappings of different memory types and exclusive
accesses to device memory. At least on an A9, it doesn't really seem to
matter. I could remove this for the ioremap case.

Rob

>> Signed-off-by: Rob Herring <[email protected]>
>> Cc: Anton Vorontsov <[email protected]>
>> Cc: Colin Cross <[email protected]>
>> Cc: Kees Cook <[email protected]>
>> Cc: Tony Luck <[email protected]>
>> Cc: [email protected]
>> ---
>> fs/pstore/ram_core.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
>> index 0306303..e126d9f 100644
>> --- a/fs/pstore/ram_core.c
>> +++ b/fs/pstore/ram_core.c
>> @@ -337,7 +337,7 @@ static void *persistent_ram_vmap(phys_addr_t start, size_t size)
>> page_start = start - offset_in_page(start);
>> page_count = DIV_ROUND_UP(size + offset_in_page(start), PAGE_SIZE);
>>
>> - prot = pgprot_noncached(PAGE_KERNEL);
>> + prot = pgprot_writecombine(PAGE_KERNEL);
> Is this necessary? Won't pgprot_noncached already be normal memory?
>
>> pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL);
>> if (!pages) {
>> @@ -364,7 +364,7 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t size)
>> return NULL;
>> }
>>
>> - return ioremap(start, size);
>> + return ioremap_wc(start, size);
>
> ioremap_wc corresponds to MT_DEVICE_WC, which is still device memory,
> so I don't see how this helps solve the problem in the commit message.
>
>> }
>>
>> static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size,
>> --
>> 1.7.10.4
>>

2013-04-10 15:55:39

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] pstore/ram: avoid atomic accesses for ioremapped regions

On 04/09/2013 11:10 PM, Colin Cross wrote:
> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
>> From: Rob Herring <[email protected]>
>>
>> For persistent RAM outside of main memory, the memory may have limitations
>> on supported accesses. For internal RAM on highbank platform exclusive
>> accesses are not supported and will hang the system. So atomic_cmpxchg
>> cannot be used. This commit uses spinlock protection for buffer size and
>> start updates on ioremapped regions instead.
>
> I used atomics in persistent_ram to support persistent ftrace, which
> now exists as PSTORE_FTRACE. At some point during development I had
> trouble with recursive tracing causing an infinite loop, so you may
> want to test that calling out to spinlock functions with PSTORE_FTRACE
> turned on and enabled doesn't cause a problem.

I've tested that now and it appears to work fine. Was there some
specific setup of ftrace that caused problems?

Rob

2013-04-15 22:21:17

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Wed, Apr 10, 2013 at 6:30 AM, Rob Herring <[email protected]> wrote:
> On 04/09/2013 10:53 PM, Colin Cross wrote:
>> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
>>> From: Rob Herring <[email protected]>
>>>
>>> Atomic operations are undefined behavior on ARM for device or strongly
>>> ordered memory types. So use write-combine variants for mappings. This
>>> corresponds to normal, non-cacheable memory on ARM. For many other
>>> architectures, this change should not change the mapping type.
>>
>> This is going to make ramconsole less reliable. A debugging printk
>> followed by a __raw_writel that causes an immediate hard crash is
>> likely to lose the last updates, including the most useful message, in
>> the write buffers.
>
> It would have to be a write that hangs the bus. In my experience with
> AXI, the bus doesn't actually hang until you hit max outstanding
> transactions.

I've seen many cases where a single write to device memory in an
unclocked slave will completely and instantly hang all cpus, and the
next write will never happen.

> I think exclusive stores will limit the buffering, but that is probably
> not architecturally guaranteed.
>
> I could put a wb() in at the end of persistent_ram_write.
>
>> Also, isn't this patch unnecessary after patch 3 in this set?
>
> It is still needed in the main memory case to be architecturally correct
> to avoid multiple mappings of different memory types and exclusive
> accesses to device memory. At least on an A9, it doesn't really seem to
> matter. I could remove this for the ioremap case.

According to my reading of the latest ARM ARM (Issue C, section
A3.5.7), and Catalin's excellent explanation
(http://lists.linaro.org/pipermail/linaro-dev/2012-February/010239.html),
it is no longer considered unpredictable to have both cached and
non-cached mappings to the same memory, as long as you use proper
cache maintenance between accessing the two mappings.

In pstore_ram the cached mapping will never be accessed (and we don't
care about speculative accesses), so no cache maintenance is
necessary. I don't see any need for this patch, and I see plenty of
possible problems.

2013-04-15 23:59:20

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On 04/15/2013 05:21 PM, Colin Cross wrote:
> On Wed, Apr 10, 2013 at 6:30 AM, Rob Herring <[email protected]> wrote:
>> On 04/09/2013 10:53 PM, Colin Cross wrote:
>>> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
>>>> From: Rob Herring <[email protected]>
>>>>
>>>> Atomic operations are undefined behavior on ARM for device or strongly
>>>> ordered memory types. So use write-combine variants for mappings. This
>>>> corresponds to normal, non-cacheable memory on ARM. For many other
>>>> architectures, this change should not change the mapping type.
>>>
>>> This is going to make ramconsole less reliable. A debugging printk
>>> followed by a __raw_writel that causes an immediate hard crash is
>>> likely to lose the last updates, including the most useful message, in
>>> the write buffers.
>>
>> It would have to be a write that hangs the bus. In my experience with
>> AXI, the bus doesn't actually hang until you hit max outstanding
>> transactions.
>
> I've seen many cases where a single write to device memory in an
> unclocked slave will completely and instantly hang all cpus, and the
> next write will never happen.
>
>> I think exclusive stores will limit the buffering, but that is probably
>> not architecturally guaranteed.
>>
>> I could put a wb() in at the end of persistent_ram_write.
>>
>>> Also, isn't this patch unnecessary after patch 3 in this set?
>>
>> It is still needed in the main memory case to be architecturally correct
>> to avoid multiple mappings of different memory types and exclusive
>> accesses to device memory. At least on an A9, it doesn't really seem to
>> matter. I could remove this for the ioremap case.
>
> According to my reading of the latest ARM ARM (Issue C, section
> A3.5.7), and Catalin's excellent explanation
> (http://lists.linaro.org/pipermail/linaro-dev/2012-February/010239.html),
> it is no longer considered unpredictable to have both cached and
> non-cached mappings to the same memory, as long as you use proper
> cache maintenance between accessing the two mappings.
>
> In pstore_ram the cached mapping will never be accessed (and we don't
> care about speculative accesses), so no cache maintenance is
> necessary. I don't see any need for this patch, and I see plenty of
> possible problems.

Exclusive accesses still have further restrictions. From section 3.4.5:

? It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
performed to a memory region
with the Device or Strongly-ordered memory attribute. Unless the
implementation documentation explicitly
states that LDREX and STREX operations to a memory region with the
Device or Strongly-ordered attribute are
permitted, the effect of such operations is UNPREDICTABLE.


Given that it is implementation defined, I don't see how Linux can rely
on that behavior.

Rob

2013-04-16 00:43:16

by Colin Cross

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Mon, Apr 15, 2013 at 4:59 PM, Rob Herring <[email protected]> wrote:
> On 04/15/2013 05:21 PM, Colin Cross wrote:
>> On Wed, Apr 10, 2013 at 6:30 AM, Rob Herring <[email protected]> wrote:
>>> On 04/09/2013 10:53 PM, Colin Cross wrote:
>>>> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
>>>>> From: Rob Herring <[email protected]>
>>>>>
>>>>> Atomic operations are undefined behavior on ARM for device or strongly
>>>>> ordered memory types. So use write-combine variants for mappings. This
>>>>> corresponds to normal, non-cacheable memory on ARM. For many other
>>>>> architectures, this change should not change the mapping type.
>>>>
>>>> This is going to make ramconsole less reliable. A debugging printk
>>>> followed by a __raw_writel that causes an immediate hard crash is
>>>> likely to lose the last updates, including the most useful message, in
>>>> the write buffers.
>>>
>>> It would have to be a write that hangs the bus. In my experience with
>>> AXI, the bus doesn't actually hang until you hit max outstanding
>>> transactions.
>>
>> I've seen many cases where a single write to device memory in an
>> unclocked slave will completely and instantly hang all cpus, and the
>> next write will never happen.
>>
>>> I think exclusive stores will limit the buffering, but that is probably
>>> not architecturally guaranteed.
>>>
>>> I could put a wb() in at the end of persistent_ram_write.
>>>
>>>> Also, isn't this patch unnecessary after patch 3 in this set?
>>>
>>> It is still needed in the main memory case to be architecturally correct
>>> to avoid multiple mappings of different memory types and exclusive
>>> accesses to device memory. At least on an A9, it doesn't really seem to
>>> matter. I could remove this for the ioremap case.
>>
>> According to my reading of the latest ARM ARM (Issue C, section
>> A3.5.7), and Catalin's excellent explanation
>> (http://lists.linaro.org/pipermail/linaro-dev/2012-February/010239.html),
>> it is no longer considered unpredictable to have both cached and
>> non-cached mappings to the same memory, as long as you use proper
>> cache maintenance between accessing the two mappings.
>>
>> In pstore_ram the cached mapping will never be accessed (and we don't
>> care about speculative accesses), so no cache maintenance is
>> necessary. I don't see any need for this patch, and I see plenty of
>> possible problems.
>
> Exclusive accesses still have further restrictions. From section 3.4.5:
>
> ? It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
> performed to a memory region
> with the Device or Strongly-ordered memory attribute. Unless the
> implementation documentation explicitly
> states that LDREX and STREX operations to a memory region with the
> Device or Strongly-ordered attribute are
> permitted, the effect of such operations is UNPREDICTABLE.
>
>
> Given that it is implementation defined, I don't see how Linux can rely
> on that behavior.

I see, the problem is that while noncached and writecombined appear to
be similar mappings, noncached is mapped in PRRR to strongly-ordered,
while writecombined is mapped to unbufferable normal memory.

I think adding a wmb() to persistent_ram_write is going to be
expensive on cpus with outer caches like the L2X0, where wmb() will
result in a spinlock. Is there a real SoC where this doesn't work?

2013-04-16 08:44:55

by Will Deacon

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Tue, Apr 16, 2013 at 01:43:09AM +0100, Colin Cross wrote:
> On Mon, Apr 15, 2013 at 4:59 PM, Rob Herring <[email protected]> wrote:
> > Exclusive accesses still have further restrictions. From section 3.4.5:
> >
> > • It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
> > performed to a memory region
> > with the Device or Strongly-ordered memory attribute. Unless the
> > implementation documentation explicitly
> > states that LDREX and STREX operations to a memory region with the
> > Device or Strongly-ordered attribute are
> > permitted, the effect of such operations is UNPREDICTABLE.
> >
> >
> > Given that it is implementation defined, I don't see how Linux can rely
> > on that behavior.
>
> I see, the problem is that while noncached and writecombined appear to
> be similar mappings, noncached is mapped in PRRR to strongly-ordered,
> while writecombined is mapped to unbufferable normal memory.
>
> I think adding a wmb() to persistent_ram_write is going to be
> expensive on cpus with outer caches like the L2X0, where wmb() will
> result in a spinlock. Is there a real SoC where this doesn't work?

A real SoC where exclusives don't work to memory not mapped as normal? Take
your pick...

Will

2013-04-16 12:58:58

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On 04/16/2013 03:44 AM, Will Deacon wrote:
> On Tue, Apr 16, 2013 at 01:43:09AM +0100, Colin Cross wrote:
>> On Mon, Apr 15, 2013 at 4:59 PM, Rob Herring <[email protected]> wrote:
>>> Exclusive accesses still have further restrictions. From section 3.4.5:
>>>
>>> • It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
>>> performed to a memory region
>>> with the Device or Strongly-ordered memory attribute. Unless the
>>> implementation documentation explicitly
>>> states that LDREX and STREX operations to a memory region with the
>>> Device or Strongly-ordered attribute are
>>> permitted, the effect of such operations is UNPREDICTABLE.
>>>
>>>
>>> Given that it is implementation defined, I don't see how Linux can rely
>>> on that behavior.
>>
>> I see, the problem is that while noncached and writecombined appear to
>> be similar mappings, noncached is mapped in PRRR to strongly-ordered,
>> while writecombined is mapped to unbufferable normal memory.
>>
>> I think adding a wmb() to persistent_ram_write is going to be
>> expensive on cpus with outer caches like the L2X0, where wmb() will
>> result in a spinlock. Is there a real SoC where this doesn't work?
>
> A real SoC where exclusives don't work to memory not mapped as normal? Take
> your pick...

This patch doesn't actually fix problems for me. Exclusives to DDR work
for any memory type for me as the DDR controller has an exclusive
monitor. It takes write-thru cache mapping to get internal RAM to work.

Rob

2013-04-16 13:49:22

by Catalin Marinas

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Tue, Apr 16, 2013 at 01:58:27PM +0100, Rob Herring wrote:
> On 04/16/2013 03:44 AM, Will Deacon wrote:
> > On Tue, Apr 16, 2013 at 01:43:09AM +0100, Colin Cross wrote:
> >> On Mon, Apr 15, 2013 at 4:59 PM, Rob Herring <[email protected]> wrote:
> >>> Exclusive accesses still have further restrictions. From section 3.4.5:
> >>>
> >>> • It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
> >>> performed to a memory region
> >>> with the Device or Strongly-ordered memory attribute. Unless the
> >>> implementation documentation explicitly
> >>> states that LDREX and STREX operations to a memory region with the
> >>> Device or Strongly-ordered attribute are
> >>> permitted, the effect of such operations is UNPREDICTABLE.
> >>>
> >>>
> >>> Given that it is implementation defined, I don't see how Linux can rely
> >>> on that behavior.
> >>
> >> I see, the problem is that while noncached and writecombined appear to
> >> be similar mappings, noncached is mapped in PRRR to strongly-ordered,
> >> while writecombined is mapped to unbufferable normal memory.
> >>
> >> I think adding a wmb() to persistent_ram_write is going to be
> >> expensive on cpus with outer caches like the L2X0, where wmb() will
> >> result in a spinlock. Is there a real SoC where this doesn't work?
> >
> > A real SoC where exclusives don't work to memory not mapped as normal? Take
> > your pick...
>
> This patch doesn't actually fix problems for me. Exclusives to DDR work
> for any memory type for me as the DDR controller has an exclusive
> monitor. It takes write-thru cache mapping to get internal RAM to work.

I can't find any reference in the ARM ARM but I think you would need
cacheable memory for the exclusives to work. A9 for example uses the
cacheline exclusiveness to emulate the global monitor.

--
Catalin

2013-04-19 09:55:09

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

On Tue, Apr 09, 2013 at 08:53:18PM -0700, Colin Cross wrote:
> On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring <[email protected]> wrote:
> > - return ioremap(start, size);
> > + return ioremap_wc(start, size);
>
> ioremap_wc corresponds to MT_DEVICE_WC, which is still device memory,
> so I don't see how this helps solve the problem in the commit message.

In reality it isn't, because there's no such thing as "write combining
device memory" in the ARM memory model.

There are three major memory types: strongly ordered, device and normal
memory. Only normal memory can be cached in any way, which includes
using write combining.

#define ioremap_wc(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_WC)

[MT_DEVICE_WC] = { /* ioremap_wc */
.prot_pte = PROT_PTE_DEVICE | L_PTE_MT_DEV_WC,

* n TR IR OR
* BUFFERABLE 001 10 00 00
* DEV_WC 001 10
(see arch/arm/mm/proc-v7-2level.S for the rest of the table and its
description.)

So, DEV_WC is an alias for BUFFERABLE, which is normal memory,
uncacheable in both inner and outer caches. This means that at the
moment, ioremap_wc() memory has the same properties as system memory
- with all the out of ordering effects you get there.

I don't put any guarantee on this though - we may end up having to
change it if we find a SoC needing this to really be device memory...

2013-06-19 11:22:46

by Wei Ni

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

> Atomic operations are undefined behavior on ARM for device or strongly
> ordered memory types. So use write-combine variants for mappings. This
> corresponds to normal, non-cacheable memory on ARM. For many other
> architectures, this change should not change the mapping type.

Hi, all
I have met this problems on tegra soc.
I tried to use pstore-ram, but the system will hang up when run the
atomic_cmpxchg() in the buffer_start_add() or buffer_size_add(),
whatever I use iomapped or vmapped regions.

I tried to apply this patch set, it was failed with vmap, but if use
iomapped rgiions, the ldrex/strex can work on this memory, and the
ramoops driver can work.

Does there have any updates for this issue?
Or how can I debug it?

Thanks.
Wei.