LinuxLists.cc - [PATCHv4 0/9] ACCESS_ONCE and non-scalar accesses

2014-12-03 22:32:04

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCHv4 0/9] ACCESS_ONCE and non-scalar accesses

As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
ACCESS_ONCE might fail with specific compiler for non-scalar accesses.

Here is a set of patches to tackle that problem.

The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure
is larger than the machine word size memcpy is used and a warning is emitted.
The next patches fix up all in-tree users of ACCESS_ONCE on non-scalar types.
The last patch forces ACCESS_ONCE to work only on scalar types.

I have cross-compiled the resulting kernel with defconfig and gcc 4.9 for
microblaze, m68k, alpha, s390,x86_64, i686, sparc, sparc64, mips,
ia64, arm and arm64.

Runtime tested on s390x and x86_64. I have also verified that ASSIGN_ONCE works
as expected with some test changes as there are no user in this patch series.

Linus, ok for the next merge window?

Christian Borntraeger (9):
kernel: Provide READ_ONCE and ASSIGN_ONCE
mm: replace ACCESS_ONCE with READ_ONCE or barriers
x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
x86/gup: Replace ACCESS_ONCE with READ_ONCE
mips/gup: Replace ACCESS_ONCE with READ_ONCE
arm64/spinlock: Replace ACCESS_ONCE READ_ONCE
arm/spinlock: Replace ACCESS_ONCE with READ_ONCE
s390/kvm: REPLACE ACCESS_ONCE with READ_ONCE
kernel: tighten rules for ACCESS ONCE

arch/arm/include/asm/spinlock.h | 4 +--
arch/arm64/include/asm/spinlock.h | 4 +--
arch/mips/mm/gup.c | 2 +-
arch/s390/kvm/gaccess.c | 14 ++++----
arch/x86/include/asm/spinlock.h | 8 ++---
arch/x86/mm/gup.c | 2 +-
include/linux/compiler.h | 74 ++++++++++++++++++++++++++++++++++++++-
mm/gup.c | 2 +-
mm/memory.c | 2 +-
mm/rmap.c | 3 +-
10 files changed, 95 insertions(+), 20 deletions(-)

--
1.9.3

2014-12-03 22:30:34

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 5/9] mips/gup: Replace ACCESS_ONCE with READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the gup code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/mips/mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
index 06ce17c..8aa50e3 100644
--- a/arch/mips/mm/gup.c
+++ b/arch/mips/mm/gup.c
@@ -30,7 +30,7 @@ retry:

return pte;
#else
- return ACCESS_ONCE(*ptep);
+ return READ_ONCE(*ptep);
#endif
}

--
1.9.3

2014-12-03 22:30:33

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 4/9] x86/gup: Replace ACCESS_ONCE with READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the gup code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/x86/mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 207d9aef..d754782 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -15,7 +15,7 @@
static inline pte_t gup_get_pte(pte_t *ptep)
{
#ifndef CONFIG_X86_PAE
- return ACCESS_ONCE(*ptep);
+ return READ_ONCE(*ptep);
#else
/*
* With get_user_pages_fast, we walk down the pagetables without taking
--
1.9.3

2014-12-03 22:30:31

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 6/9] arm64/spinlock: Replace ACCESS_ONCE READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/arm64/include/asm/spinlock.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index c45b7b1..cee1287 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -99,12 +99,12 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)

static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- return !arch_spin_value_unlocked(ACCESS_ONCE(*lock));
+ return !arch_spin_value_unlocked(READ_ONCE(*lock));
}

static inline int arch_spin_is_contended(arch_spinlock_t *lock)
{
- arch_spinlock_t lockval = ACCESS_ONCE(*lock);
+ arch_spinlock_t lockval = READ_ONCE(*lock);
return (lockval.next - lockval.owner) > 1;
}
#define arch_spin_is_contended arch_spin_is_contended
--
1.9.3

2014-12-03 22:31:15

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 9/9] kernel: tighten rules for ACCESS ONCE

Now that all non-scalar users of ACCESS_ONCE have been converted
to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only
work on scalar types.
This variant was proposed by Alexei Starovoitov.

Signed-off-by: Christian Borntraeger <[email protected]>
---
include/linux/compiler.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 947710e..5ba91de 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -441,8 +441,16 @@ static __always_inline void __assign_once_size(volatile void *p, void *res, int
* merging, or refetching absolutely anything at any time. Its main intended
* use is to mediate communication between process-level code and irq/NMI
* handlers, all running on the same CPU.
+ *
+ * ACCESS_ONCE will only work on scalar types. For union types, ACCESS_ONCE
+ * on a union member will work as long as the size of the member matches the
+ * size of the union and the size is smaller than word size.
+ * If possible READ_ONCE/ASSIGN_ONCE should be used instead.
*/
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+#define __ACCESS_ONCE(x) ({ \
+ __maybe_unused typeof(x) __var = 0; \
+ (volatile typeof(x) *)&(x); })
+#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))

/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
--
1.9.3

2014-12-03 22:32:00

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 7/9] arm/spinlock: Replace ACCESS_ONCE with READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/arm/include/asm/spinlock.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index ac4bfae..0fa4184 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -120,12 +120,12 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)

static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- return !arch_spin_value_unlocked(ACCESS_ONCE(*lock));
+ return !arch_spin_value_unlocked(READ_ONCE(*lock));
}

static inline int arch_spin_is_contended(arch_spinlock_t *lock)
{
- struct __raw_tickets tickets = ACCESS_ONCE(lock->tickets);
+ struct __raw_tickets tickets = READ_ONCE(lock->tickets);
return (tickets.next - tickets.owner) > 1;
}
#define arch_spin_is_contended arch_spin_is_contended
--
1.9.3

2014-12-03 22:32:02

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 8/9] s390/kvm: REPLACE ACCESS_ONCE with READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the ipte lock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/s390/kvm/gaccess.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 0f961a1..8f195fa 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -229,10 +229,10 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
goto out;
ic = &vcpu->kvm->arch.sca->ipte_control;
do {
- old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
while (old.k) {
cond_resched();
- old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
}
new = old;
new.k = 1;
@@ -251,7 +251,8 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
goto out;
ic = &vcpu->kvm->arch.sca->ipte_control;
do {
- new = old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
+ new = old;
new.k = 0;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
wake_up(&vcpu->kvm->arch.ipte_wq);
@@ -265,10 +266,10 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)

ic = &vcpu->kvm->arch.sca->ipte_control;
do {
- old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
while (old.kg) {
cond_resched();
- old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
}
new = old;
new.k = 1;
@@ -282,7 +283,8 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)

ic = &vcpu->kvm->arch.sca->ipte_control;
do {
- new = old = ACCESS_ONCE(*ic);
+ old = READ_ONCE(*ic);
+ new = old;
new.kh--;
if (!new.kh)
new.k = 0;
--
1.9.3

2014-12-03 22:32:11

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
scalar types as suggested by Linus Torvalds. Accesses larger than
the machines word size cannot be guaranteed to be atomic. These
macros will use memcpy and emit a build warning.

Signed-off-by: Christian Borntraeger <[email protected]>
---
include/linux/compiler.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index d5ad7b1..947710e 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -186,6 +186,70 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
#endif

+#include <linux/types.h>
+
+void data_access_exceeds_word_size(void)
+__compiletime_warning("data access exceeds word size and won't be atomic");
+
+static __always_inline void __read_once_size(volatile void *p, void *res, int size)
+{
+ switch (size) {
+ case 1: *(u8 *)res = *(volatile u8 *)p; break;
+ case 2: *(u16 *)res = *(volatile u16 *)p; break;
+ case 4: *(u32 *)res = *(volatile u32 *)p; break;
+#ifdef CONFIG_64BIT
+ case 8: *(u64 *)res = *(volatile u64 *)p; break;
+#endif
+ default:
+ barrier();
+ __builtin_memcpy((void *)res, (const void *)p, size);
+ data_access_exceeds_word_size();
+ barrier();
+ }
+}
+
+static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
+{
+ switch (size) {
+ case 1: *(volatile u8 *)p = *(u8 *)res; break;
+ case 2: *(volatile u16 *)p = *(u16 *)res; break;
+ case 4: *(volatile u32 *)p = *(u32 *)res; break;
+#ifdef CONFIG_64BIT
+ case 8: *(volatile u64 *)p = *(u64 *)res; break;
+#endif
+ default:
+ barrier();
+ __builtin_memcpy((void *)p, (const void *)res, size);
+ data_access_exceeds_word_size();
+ barrier();
+ }
+}
+
+/*
+ * Prevent the compiler from merging or refetching reads or writes. The compiler
+ * is also forbidden from reordering successive instances of READ_ONCE,
+ * ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the compiler is aware
+ * of some particular ordering. One way to make the compiler aware of ordering
+ * is to put the two invocations of READ_ONCE, ASSIGN_ONCE or ACCESS_ONCE() in
+ * different C statements.
+ *
+ * In contrast to ACCESS_ONCE these two macros will also work on aggregate data
+ * types like structs or unions. If the size of the accessed data type exceeds
+ * the word size of the machine (e.g. 32bit or 64bit), the access might happen
+ * in multiple chunks, though.
+ *
+ * These macros do absolutely -nothing- to prevent the CPU from reordering,
+ * merging, or refetching absolutely anything at any time. Their main intended
+ * use is to mediate communication between process-level code and irq/NMI
+ * handlers, all running on the same CPU.
+ */
+
+#define READ_ONCE(p) \
+ ({ typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
+
+#define ASSIGN_ONCE(val, p) \
+ ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })
+
#endif /* __KERNEL__ */

#endif /* __ASSEMBLY__ */
--
1.9.3

2014-12-03 22:32:09

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 3/9] x86/spinlock: Replace ACCESS_ONCE with READ_ONCE

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.

Signed-off-by: Christian Borntraeger <[email protected]>
---
arch/x86/include/asm/spinlock.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 9295016..12a69b4 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -92,7 +92,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
unsigned count = SPIN_THRESHOLD;

do {
- if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
+ if (READ_ONCE(lock->tickets.head) == inc.tail)
goto out;
cpu_relax();
} while (--count);
@@ -105,7 +105,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
{
arch_spinlock_t old, new;

- old.tickets = ACCESS_ONCE(lock->tickets);
+ old.tickets = READ_ONCE(lock->tickets);
if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
return 0;

@@ -162,14 +162,14 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)

static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
+ struct __raw_tickets tmp = READ_ONCE(lock->tickets);

return tmp.tail != tmp.head;
}

static inline int arch_spin_is_contended(arch_spinlock_t *lock)
{
- struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
+ struct __raw_tickets tmp = READ_ONCE(lock->tickets);

return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
}
--
1.9.3

2014-12-03 22:32:07

by Christian Borntraeger

[permalink] [raw]

Subject: [PATCH 2/9] mm: replace ACCESS_ONCE with READ_ONCE or barriers

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Let's change the code to access the page table elements with
READ_ONCE that does implicit scalar accesses.

mm_find_pmd is tricky, because m68k and sparc(32bit) define pmd_t
as array of longs. This code requires just that the pmd_present
and pmd_trans_huge check are done on the same value, so a barrier
is sufficent.

Signed-off-by: Christian Borntraeger <[email protected]>
---
mm/gup.c | 2 +-
mm/memory.c | 2 +-
mm/rmap.c | 3 ++-
3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index cd62c8c..f2305de 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -917,7 +917,7 @@ static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,

pudp = pud_offset(pgdp, addr);
do {
- pud_t pud = ACCESS_ONCE(*pudp);
+ pud_t pud = READ_ONCE(*pudp);

next = pud_addr_end(addr, end);
if (pud_none(pud))
diff --git a/mm/memory.c b/mm/memory.c
index 3e50383..9e0c84e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3202,7 +3202,7 @@ static int handle_pte_fault(struct mm_struct *mm,
pte_t entry;
spinlock_t *ptl;

- entry = ACCESS_ONCE(*pte);
+ entry = READ_ONCE(*pte);
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
diff --git a/mm/rmap.c b/mm/rmap.c
index 19886fb..1e54274 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -581,7 +581,8 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
* without holding anon_vma lock for write. So when looking for a
* genuine pmde (in which to find pte), test present and !THP together.
*/
- pmde = ACCESS_ONCE(*pmd);
+ pmde = *pmd;
+ barrier();
if (!pmd_present(pmde) || pmd_trans_huge(pmde))
pmd = NULL;
out:
--
1.9.3

2014-12-04 00:07:56

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

On Wed, Dec 03, 2014 at 11:30:13PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
> scalar types as suggested by Linus Torvalds. Accesses larger than
> the machines word size cannot be guaranteed to be atomic. These
> macros will use memcpy and emit a build warning.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

With or without some nits below addressed:

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> include/linux/compiler.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 64 insertions(+)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index d5ad7b1..947710e 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -186,6 +186,70 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
> #endif
>
> +#include <linux/types.h>
> +
> +void data_access_exceeds_word_size(void)
> +__compiletime_warning("data access exceeds word size and won't be atomic");
> +
> +static __always_inline void __read_once_size(volatile void *p, void *res, int size)
> +{
> + switch (size) {
> + case 1: *(u8 *)res = *(volatile u8 *)p; break;
> + case 2: *(u16 *)res = *(volatile u16 *)p; break;
> + case 4: *(u32 *)res = *(volatile u32 *)p; break;
> +#ifdef CONFIG_64BIT
> + case 8: *(u64 *)res = *(volatile u64 *)p; break;
> +#endif

You could get rid of the #ifdef above by doing "case sizeof(long)"
and switching the casts from u64 to unsigned long.

> + default:
> + barrier();
> + __builtin_memcpy((void *)res, (const void *)p, size);
> + data_access_exceeds_word_size();
> + barrier();
> + }
> +}
> +
> +static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
> +{
> + switch (size) {
> + case 1: *(volatile u8 *)p = *(u8 *)res; break;
> + case 2: *(volatile u16 *)p = *(u16 *)res; break;
> + case 4: *(volatile u32 *)p = *(u32 *)res; break;
> +#ifdef CONFIG_64BIT
> + case 8: *(volatile u64 *)p = *(u64 *)res; break;
> +#endif

Ditto.

> + default:
> + barrier();
> + __builtin_memcpy((void *)p, (const void *)res, size);
> + data_access_exceeds_word_size();
> + barrier();
> + }
> +}
> +
> +/*
> + * Prevent the compiler from merging or refetching reads or writes. The compiler
> + * is also forbidden from reordering successive instances of READ_ONCE,
> + * ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the compiler is aware
> + * of some particular ordering. One way to make the compiler aware of ordering
> + * is to put the two invocations of READ_ONCE, ASSIGN_ONCE or ACCESS_ONCE() in
> + * different C statements.
> + *
> + * In contrast to ACCESS_ONCE these two macros will also work on aggregate data
> + * types like structs or unions. If the size of the accessed data type exceeds
> + * the word size of the machine (e.g. 32bit or 64bit), the access might happen
> + * in multiple chunks, though.

This last sentence might be more clear if it was something like the
following:

If the size of the accessed data type exceeeds the word size of
the machine (e.g., 32 bits or 64 bits), ACCESS_ONCE() might
carry out the access in multiple chunks, but READ_ONCE() and
ASSIGN_ONCE() will give a link-time error.

> + *
> + * These macros do absolutely -nothing- to prevent the CPU from reordering,
> + * merging, or refetching absolutely anything at any time. Their main intended
> + * use is to mediate communication between process-level code and irq/NMI
> + * handlers, all running on the same CPU.

This last sentence is now obsolete. How about something like this?

Their two major use cases are: (1) Mediating communication
between process-level code and irq/NMI handlers, all running
on the same CPU, and (2) Ensuring that the compiler does not
fold, spindle, or otherwise mutilate accesses that either do
not require ordering or that interact with an explicit memory
barrier or atomic instruction that provides the required ordering.

> + */
> +
> +#define READ_ONCE(p) \
> + ({ typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
> +
> +#define ASSIGN_ONCE(val, p) \
> + ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })
> +
> #endif /* __KERNEL__ */
>
> #endif /* __ASSEMBLY__ */
> --
> 1.9.3
>

2014-12-04 00:10:00

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 2/9] mm: replace ACCESS_ONCE with READ_ONCE or barriers

On Wed, Dec 03, 2014 at 11:30:14PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Let's change the code to access the page table elements with
> READ_ONCE that does implicit scalar accesses.
>
> mm_find_pmd is tricky, because m68k and sparc(32bit) define pmd_t
> as array of longs. This code requires just that the pmd_present
> and pmd_trans_huge check are done on the same value, so a barrier
> is sufficent.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> mm/gup.c | 2 +-
> mm/memory.c | 2 +-
> mm/rmap.c | 3 ++-
> 3 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index cd62c8c..f2305de 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -917,7 +917,7 @@ static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
>
> pudp = pud_offset(pgdp, addr);
> do {
> - pud_t pud = ACCESS_ONCE(*pudp);
> + pud_t pud = READ_ONCE(*pudp);
>
> next = pud_addr_end(addr, end);
> if (pud_none(pud))
> diff --git a/mm/memory.c b/mm/memory.c
> index 3e50383..9e0c84e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3202,7 +3202,7 @@ static int handle_pte_fault(struct mm_struct *mm,
> pte_t entry;
> spinlock_t *ptl;
>
> - entry = ACCESS_ONCE(*pte);
> + entry = READ_ONCE(*pte);
> if (!pte_present(entry)) {
> if (pte_none(entry)) {
> if (vma->vm_ops) {
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 19886fb..1e54274 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -581,7 +581,8 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address)
> * without holding anon_vma lock for write. So when looking for a
> * genuine pmde (in which to find pte), test present and !THP together.
> */
> - pmde = ACCESS_ONCE(*pmd);
> + pmde = *pmd;
> + barrier();
> if (!pmd_present(pmde) || pmd_trans_huge(pmde))
> pmd = NULL;
> out:
> --
> 1.9.3
>

2014-12-04 00:10:36

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 3/9] x86/spinlock: Replace ACCESS_ONCE with READ_ONCE

On Wed, Dec 03, 2014 at 11:30:15PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/x86/include/asm/spinlock.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index 9295016..12a69b4 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -92,7 +92,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
> unsigned count = SPIN_THRESHOLD;
>
> do {
> - if (ACCESS_ONCE(lock->tickets.head) == inc.tail)
> + if (READ_ONCE(lock->tickets.head) == inc.tail)
> goto out;
> cpu_relax();
> } while (--count);
> @@ -105,7 +105,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
> {
> arch_spinlock_t old, new;
>
> - old.tickets = ACCESS_ONCE(lock->tickets);
> + old.tickets = READ_ONCE(lock->tickets);
> if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
> return 0;
>
> @@ -162,14 +162,14 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
>
> static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> {
> - struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
> + struct __raw_tickets tmp = READ_ONCE(lock->tickets);
>
> return tmp.tail != tmp.head;
> }
>
> static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> {
> - struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
> + struct __raw_tickets tmp = READ_ONCE(lock->tickets);
>
> return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
> }
> --
> 1.9.3
>

2014-12-04 00:10:57

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 4/9] x86/gup: Replace ACCESS_ONCE with READ_ONCE

On Wed, Dec 03, 2014 at 11:30:16PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the gup code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/x86/mm/gup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
> index 207d9aef..d754782 100644
> --- a/arch/x86/mm/gup.c
> +++ b/arch/x86/mm/gup.c
> @@ -15,7 +15,7 @@
> static inline pte_t gup_get_pte(pte_t *ptep)
> {
> #ifndef CONFIG_X86_PAE
> - return ACCESS_ONCE(*ptep);
> + return READ_ONCE(*ptep);
> #else
> /*
> * With get_user_pages_fast, we walk down the pagetables without taking
> --
> 1.9.3
>

2014-12-04 00:11:22

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 5/9] mips/gup: Replace ACCESS_ONCE with READ_ONCE

On Wed, Dec 03, 2014 at 11:30:17PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the gup code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/mips/mm/gup.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/mips/mm/gup.c b/arch/mips/mm/gup.c
> index 06ce17c..8aa50e3 100644
> --- a/arch/mips/mm/gup.c
> +++ b/arch/mips/mm/gup.c
> @@ -30,7 +30,7 @@ retry:
>
> return pte;
> #else
> - return ACCESS_ONCE(*ptep);
> + return READ_ONCE(*ptep);
> #endif
> }
>
> --
> 1.9.3
>

2014-12-04 00:11:51

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 6/9] arm64/spinlock: Replace ACCESS_ONCE READ_ONCE

On Wed, Dec 03, 2014 at 11:30:18PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/arm64/include/asm/spinlock.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
> index c45b7b1..cee1287 100644
> --- a/arch/arm64/include/asm/spinlock.h
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -99,12 +99,12 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>
> static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> {
> - return !arch_spin_value_unlocked(ACCESS_ONCE(*lock));
> + return !arch_spin_value_unlocked(READ_ONCE(*lock));
> }
>
> static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> {
> - arch_spinlock_t lockval = ACCESS_ONCE(*lock);
> + arch_spinlock_t lockval = READ_ONCE(*lock);
> return (lockval.next - lockval.owner) > 1;
> }
> #define arch_spin_is_contended arch_spin_is_contended
> --
> 1.9.3
>

2014-12-04 00:12:09

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 7/9] arm/spinlock: Replace ACCESS_ONCE with READ_ONCE

On Wed, Dec 03, 2014 at 11:30:19PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the spinlock code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/arm/include/asm/spinlock.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
> index ac4bfae..0fa4184 100644
> --- a/arch/arm/include/asm/spinlock.h
> +++ b/arch/arm/include/asm/spinlock.h
> @@ -120,12 +120,12 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>
> static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> {
> - return !arch_spin_value_unlocked(ACCESS_ONCE(*lock));
> + return !arch_spin_value_unlocked(READ_ONCE(*lock));
> }
>
> static inline int arch_spin_is_contended(arch_spinlock_t *lock)
> {
> - struct __raw_tickets tickets = ACCESS_ONCE(lock->tickets);
> + struct __raw_tickets tickets = READ_ONCE(lock->tickets);
> return (tickets.next - tickets.owner) > 1;
> }
> #define arch_spin_is_contended arch_spin_is_contended
> --
> 1.9.3
>

2014-12-04 00:12:50

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 8/9] s390/kvm: REPLACE ACCESS_ONCE with READ_ONCE

On Wed, Dec 03, 2014 at 11:30:20PM +0100, Christian Borntraeger wrote:
> ACCESS_ONCE does not work reliably on non-scalar types. For
> example gcc 4.6 and 4.7 might remove the volatile tag for such
> accesses during the SRA (scalar replacement of aggregates) step
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>
> Change the ipte lock code to replace ACCESS_ONCE with READ_ONCE.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

Acked-by: Paul E. McKenney <[email protected]>

> ---
> arch/s390/kvm/gaccess.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index 0f961a1..8f195fa 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -229,10 +229,10 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
> goto out;
> ic = &vcpu->kvm->arch.sca->ipte_control;
> do {
> - old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> while (old.k) {
> cond_resched();
> - old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> }
> new = old;
> new.k = 1;
> @@ -251,7 +251,8 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
> goto out;
> ic = &vcpu->kvm->arch.sca->ipte_control;
> do {
> - new = old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> + new = old;
> new.k = 0;
> } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
> wake_up(&vcpu->kvm->arch.ipte_wq);
> @@ -265,10 +266,10 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
>
> ic = &vcpu->kvm->arch.sca->ipte_control;
> do {
> - old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> while (old.kg) {
> cond_resched();
> - old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> }
> new = old;
> new.k = 1;
> @@ -282,7 +283,8 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
>
> ic = &vcpu->kvm->arch.sca->ipte_control;
> do {
> - new = old = ACCESS_ONCE(*ic);
> + old = READ_ONCE(*ic);
> + new = old;
> new.kh--;
> if (!new.kh)
> new.k = 0;
> --
> 1.9.3
>

2014-12-04 00:16:29

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 9/9] kernel: tighten rules for ACCESS ONCE

On Wed, Dec 03, 2014 at 11:30:21PM +0100, Christian Borntraeger wrote:
> Now that all non-scalar users of ACCESS_ONCE have been converted
> to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only
> work on scalar types.
> This variant was proposed by Alexei Starovoitov.
>
> Signed-off-by: Christian Borntraeger <[email protected]>

With or without the updated comment:

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> include/linux/compiler.h | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 947710e..5ba91de 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -441,8 +441,16 @@ static __always_inline void __assign_once_size(volatile void *p, void *res, int
> * merging, or refetching absolutely anything at any time. Its main intended
> * use is to mediate communication between process-level code and irq/NMI
> * handlers, all running on the same CPU.

This comment is obsolete in the same way as that of READ_ONCE() and
ASSIGN_ONCE(), but probably more to the point to just get rid of
ACCESS_ONCE(). ;-)

> + *
> + * ACCESS_ONCE will only work on scalar types. For union types, ACCESS_ONCE
> + * on a union member will work as long as the size of the member matches the
> + * size of the union and the size is smaller than word size.
> + * If possible READ_ONCE/ASSIGN_ONCE should be used instead.
> */
> -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#define __ACCESS_ONCE(x) ({ \
> + __maybe_unused typeof(x) __var = 0; \
> + (volatile typeof(x) *)&(x); })
> +#define ACCESS_ONCE(x) (*__ACCESS_ONCE(x))
>
> /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
> #ifdef CONFIG_KPROBES
> --
> 1.9.3
>

2014-12-04 09:24:58

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

Am 04.12.2014 um 01:07 schrieb Paul E. McKenney:
> On Wed, Dec 03, 2014 at 11:30:13PM +0100, Christian Borntraeger wrote:
>> ACCESS_ONCE does not work reliably on non-scalar types. For
>> example gcc 4.6 and 4.7 might remove the volatile tag for such
>> accesses during the SRA (scalar replacement of aggregates) step
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
>>
>> Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
>> scalar types as suggested by Linus Torvalds. Accesses larger than
>> the machines word size cannot be guaranteed to be atomic. These
>> macros will use memcpy and emit a build warning.
>>
>> Signed-off-by: Christian Borntraeger <[email protected]>
>
> With or without some nits below addressed:
>
> Reviewed-by: Paul E. McKenney <[email protected]>
>
>> ---
>> include/linux/compiler.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 64 insertions(+)
>>
>> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>> index d5ad7b1..947710e 100644
>> --- a/include/linux/compiler.h
>> +++ b/include/linux/compiler.h
>> @@ -186,6 +186,70 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
>> # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
>> #endif
>>
>> +#include <linux/types.h>
>> +
>> +void data_access_exceeds_word_size(void)
>> +__compiletime_warning("data access exceeds word size and won't be atomic");
>> +
>> +static __always_inline void __read_once_size(volatile void *p, void *res, int size)
>> +{
>> + switch (size) {
>> + case 1: *(u8 *)res = *(volatile u8 *)p; break;
>> + case 2: *(u16 *)res = *(volatile u16 *)p; break;
>> + case 4: *(u32 *)res = *(volatile u32 *)p; break;
>> +#ifdef CONFIG_64BIT
>> + case 8: *(u64 *)res = *(volatile u64 *)p; break;
>> +#endif
>
> You could get rid of the #ifdef above by doing "case sizeof(long)"
> and switching the casts from u64 to unsigned long.

Wouldnt that cause a compile warning because we have two case statements
with the same size?

>
>> + default:
>> + barrier();
>> + __builtin_memcpy((void *)res, (const void *)p, size);
>> + data_access_exceeds_word_size();
>> + barrier();
>> + }
>> +}
>> +
>> +static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
>> +{
>> + switch (size) {
>> + case 1: *(volatile u8 *)p = *(u8 *)res; break;
>> + case 2: *(volatile u16 *)p = *(u16 *)res; break;
>> + case 4: *(volatile u32 *)p = *(u32 *)res; break;
>> +#ifdef CONFIG_64BIT
>> + case 8: *(volatile u64 *)p = *(u64 *)res; break;
>> +#endif
>
> Ditto.
>
>> + default:
>> + barrier();
>> + __builtin_memcpy((void *)p, (const void *)res, size);
>> + data_access_exceeds_word_size();
>> + barrier();
>> + }
>> +}
>> +
>> +/*
>> + * Prevent the compiler from merging or refetching reads or writes. The compiler
>> + * is also forbidden from reordering successive instances of READ_ONCE,
>> + * ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the compiler is aware
>> + * of some particular ordering. One way to make the compiler aware of ordering
>> + * is to put the two invocations of READ_ONCE, ASSIGN_ONCE or ACCESS_ONCE() in
>> + * different C statements.
>> + *
>> + * In contrast to ACCESS_ONCE these two macros will also work on aggregate data
>> + * types like structs or unions. If the size of the accessed data type exceeds
>> + * the word size of the machine (e.g. 32bit or 64bit), the access might happen
>> + * in multiple chunks, though.
>
> This last sentence might be more clear if it was something like the
> following:
>
> If the size of the accessed data type exceeeds the word size of
> the machine (e.g., 32 bits or 64 bits), ACCESS_ONCE() might
> carry out the access in multiple chunks, but READ_ONCE() and
> ASSIGN_ONCE() will give a link-time error.

The code in v4 now combines Linus (memcpy) and your (error) suggestion. We do a memcpy and a compile time _warning_

So I will do

* In contrast to ACCESS_ONCE these two macros will also work on aggregate
* data types like structs or unions. If the size of the accessed data
* type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
* READ_ONCE() and ASSIGN_ONCE() will fall back to memcpy and print a
* compile-time warning.

>
>> + *
>> + * These macros do absolutely -nothing- to prevent the CPU from reordering,
>> + * merging, or refetching absolutely anything at any time. Their main intended
>> + * use is to mediate communication between process-level code and irq/NMI
>> + * handlers, all running on the same CPU.
>
> This last sentence is now obsolete. How about something like this?
>
> Their two major use cases are: (1) Mediating communication
> between process-level code and irq/NMI handlers, all running
> on the same CPU, and (2) Ensuring that the compiler does not
> fold, spindle, or otherwise mutilate accesses that either do
> not require ordering or that interact with an explicit memory
> barrier or atomic instruction that provides the required ordering.

sounds good. Will change.

Thanks

>
>> + */
>> +
>> +#define READ_ONCE(p) \
>> + ({ typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
>> +
>> +#define ASSIGN_ONCE(val, p) \
>> + ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })
>> +
>> #endif /* __KERNEL__ */
>>
>> #endif /* __ASSEMBLY__ */
>> --
>> 1.9.3
>>

2014-12-04 09:28:32

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH 9/9] kernel: tighten rules for ACCESS ONCE

Am 04.12.2014 um 01:16 schrieb Paul E. McKenney:
>> * merging, or refetching absolutely anything at any time. Its main intended
>> > * use is to mediate communication between process-level code and irq/NMI
>> > * handlers, all running on the same CPU.
> This comment is obsolete in the same way as that of READ_ONCE() and
> ASSIGN_ONCE(), but probably more to the point to just get rid of
> ACCESS_ONCE(). ;-)
>
>> >

Its now

/*
* Prevent the compiler from merging or refetching accesses. The compiler
* is also forbidden from reordering successive instances of ACCESS_ONCE(),
* but only when the compiler is aware of some particular ordering. One way
* to make the compiler aware of ordering is to put the two invocations of
* ACCESS_ONCE() in different C statements.
*
* ACCESS_ONCE will only work on scalar types. For union types, ACCESS_ONCE
* on a union member will work as long as the size of the member matches the
* size of the union and the size is smaller than word size.
*
* The major use cases of ACCESS_ONCE used to be (1) Mediating communication
* between process-level code and irq/NMI handlers, all running on the same CPU,
* and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
*
* If possible use READ_ONCE/ASSIGN_ONCE instead.
*/

2014-12-04 14:41:23

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

On Thu, Dec 04, 2014 at 10:24:47AM +0100, Christian Borntraeger wrote:
> Am 04.12.2014 um 01:07 schrieb Paul E. McKenney:
> > On Wed, Dec 03, 2014 at 11:30:13PM +0100, Christian Borntraeger wrote:
> >> ACCESS_ONCE does not work reliably on non-scalar types. For
> >> example gcc 4.6 and 4.7 might remove the volatile tag for such
> >> accesses during the SRA (scalar replacement of aggregates) step
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
> >>
> >> Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
> >> scalar types as suggested by Linus Torvalds. Accesses larger than
> >> the machines word size cannot be guaranteed to be atomic. These
> >> macros will use memcpy and emit a build warning.
> >>
> >> Signed-off-by: Christian Borntraeger <[email protected]>
> >
> > With or without some nits below addressed:
> >
> > Reviewed-by: Paul E. McKenney <[email protected]>
> >
> >> ---
> >> include/linux/compiler.h | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 64 insertions(+)
> >>
> >> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> >> index d5ad7b1..947710e 100644
> >> --- a/include/linux/compiler.h
> >> +++ b/include/linux/compiler.h
> >> @@ -186,6 +186,70 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
> >> # define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
> >> #endif
> >>
> >> +#include <linux/types.h>
> >> +
> >> +void data_access_exceeds_word_size(void)
> >> +__compiletime_warning("data access exceeds word size and won't be atomic");
> >> +
> >> +static __always_inline void __read_once_size(volatile void *p, void *res, int size)
> >> +{
> >> + switch (size) {
> >> + case 1: *(u8 *)res = *(volatile u8 *)p; break;
> >> + case 2: *(u16 *)res = *(volatile u16 *)p; break;
> >> + case 4: *(u32 *)res = *(volatile u32 *)p; break;
> >> +#ifdef CONFIG_64BIT
> >> + case 8: *(u64 *)res = *(volatile u64 *)p; break;
> >> +#endif
> >
> > You could get rid of the #ifdef above by doing "case sizeof(long)"
> > and switching the casts from u64 to unsigned long.
>
> Wouldnt that cause a compile warning because we have two case statements
> with the same size?

Indeed it would! Color me blind and oblivious.

> >> + default:
> >> + barrier();
> >> + __builtin_memcpy((void *)res, (const void *)p, size);
> >> + data_access_exceeds_word_size();
> >> + barrier();
> >> + }
> >> +}
> >> +
> >> +static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
> >> +{
> >> + switch (size) {
> >> + case 1: *(volatile u8 *)p = *(u8 *)res; break;
> >> + case 2: *(volatile u16 *)p = *(u16 *)res; break;
> >> + case 4: *(volatile u32 *)p = *(u32 *)res; break;
> >> +#ifdef CONFIG_64BIT
> >> + case 8: *(volatile u64 *)p = *(u64 *)res; break;
> >> +#endif
> >
> > Ditto.
> >
> >> + default:
> >> + barrier();
> >> + __builtin_memcpy((void *)p, (const void *)res, size);
> >> + data_access_exceeds_word_size();
> >> + barrier();
> >> + }
> >> +}
> >> +
> >> +/*
> >> + * Prevent the compiler from merging or refetching reads or writes. The compiler
> >> + * is also forbidden from reordering successive instances of READ_ONCE,
> >> + * ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the compiler is aware
> >> + * of some particular ordering. One way to make the compiler aware of ordering
> >> + * is to put the two invocations of READ_ONCE, ASSIGN_ONCE or ACCESS_ONCE() in
> >> + * different C statements.
> >> + *
> >> + * In contrast to ACCESS_ONCE these two macros will also work on aggregate data
> >> + * types like structs or unions. If the size of the accessed data type exceeds
> >> + * the word size of the machine (e.g. 32bit or 64bit), the access might happen
> >> + * in multiple chunks, though.
> >
> > This last sentence might be more clear if it was something like the
> > following:
> >
> > If the size of the accessed data type exceeeds the word size of
> > the machine (e.g., 32 bits or 64 bits), ACCESS_ONCE() might
> > carry out the access in multiple chunks, but READ_ONCE() and
> > ASSIGN_ONCE() will give a link-time error.
>
> The code in v4 now combines Linus (memcpy) and your (error) suggestion. We do a memcpy and a compile time _warning_
>
> So I will do
>
> * In contrast to ACCESS_ONCE these two macros will also work on aggregate
> * data types like structs or unions. If the size of the accessed data
> * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
> * READ_ONCE() and ASSIGN_ONCE() will fall back to memcpy and print a
> * compile-time warning.

Very good!

Thanx, Paul

> >> + *
> >> + * These macros do absolutely -nothing- to prevent the CPU from reordering,
> >> + * merging, or refetching absolutely anything at any time. Their main intended
> >> + * use is to mediate communication between process-level code and irq/NMI
> >> + * handlers, all running on the same CPU.
> >
> > This last sentence is now obsolete. How about something like this?
> >
> > Their two major use cases are: (1) Mediating communication
> > between process-level code and irq/NMI handlers, all running
> > on the same CPU, and (2) Ensuring that the compiler does not
> > fold, spindle, or otherwise mutilate accesses that either do
> > not require ordering or that interact with an explicit memory
> > barrier or atomic instruction that provides the required ordering.
>
> sounds good. Will change.
>
> Thanks
>
> >
> >> + */
> >> +
> >> +#define READ_ONCE(p) \
> >> + ({ typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
> >> +
> >> +#define ASSIGN_ONCE(val, p) \
> >> + ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })
> >> +
> >> #endif /* __KERNEL__ */
> >>
> >> #endif /* __ASSEMBLY__ */
> >> --
> >> 1.9.3
> >>
>

2014-12-04 14:42:00

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 9/9] kernel: tighten rules for ACCESS ONCE

On Thu, Dec 04, 2014 at 10:28:25AM +0100, Christian Borntraeger wrote:
> Am 04.12.2014 um 01:16 schrieb Paul E. McKenney:
> >> * merging, or refetching absolutely anything at any time. Its main intended
> >> > * use is to mediate communication between process-level code and irq/NMI
> >> > * handlers, all running on the same CPU.
> > This comment is obsolete in the same way as that of READ_ONCE() and
> > ASSIGN_ONCE(), but probably more to the point to just get rid of
> > ACCESS_ONCE(). ;-)
> >
> >> >
>
> Its now
>
> /*
> * Prevent the compiler from merging or refetching accesses. The compiler
> * is also forbidden from reordering successive instances of ACCESS_ONCE(),
> * but only when the compiler is aware of some particular ordering. One way
> * to make the compiler aware of ordering is to put the two invocations of
> * ACCESS_ONCE() in different C statements.
> *
> * ACCESS_ONCE will only work on scalar types. For union types, ACCESS_ONCE
> * on a union member will work as long as the size of the member matches the
> * size of the union and the size is smaller than word size.
> *
> * The major use cases of ACCESS_ONCE used to be (1) Mediating communication
> * between process-level code and irq/NMI handlers, all running on the same CPU,
> * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
> * mutilate accesses that either do not require ordering or that interact
> * with an explicit memory barrier or atomic instruction that provides the
> * required ordering.
> *
> * If possible use READ_ONCE/ASSIGN_ONCE instead.
> */

Looks good!

Thanx, Paul

2014-12-04 15:24:11

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCHv4 0/9] ACCESS_ONCE and non-scalar accesses

Am 03.12.2014 um 23:30 schrieb Christian Borntraeger:
> As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
> ACCESS_ONCE might fail with specific compiler for non-scalar accesses.
>
> Here is a set of patches to tackle that problem.
>
> The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure
> is larger than the machine word size memcpy is used and a warning is emitted.
> The next patches fix up all in-tree users of ACCESS_ONCE on non-scalar types.
> The last patch forces ACCESS_ONCE to work only on scalar types.
>
> I have cross-compiled the resulting kernel with defconfig and gcc 4.9 for
> microblaze, m68k, alpha, s390,x86_64, i686, sparc, sparc64, mips,
> ia64, arm and arm64.
>
> Runtime tested on s390x and x86_64. I have also verified that ASSIGN_ONCE works
> as expected with some test changes as there are no user in this patch series.
>
> Linus, ok for the next merge window?
>
> Christian Borntraeger (9):
> kernel: Provide READ_ONCE and ASSIGN_ONCE
> mm: replace ACCESS_ONCE with READ_ONCE or barriers
> x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
> x86/gup: Replace ACCESS_ONCE with READ_ONCE
> mips/gup: Replace ACCESS_ONCE with READ_ONCE
> arm64/spinlock: Replace ACCESS_ONCE READ_ONCE
> arm/spinlock: Replace ACCESS_ONCE with READ_ONCE
> s390/kvm: REPLACE ACCESS_ONCE with READ_ONCE
> kernel: tighten rules for ACCESS ONCE
>
> arch/arm/include/asm/spinlock.h | 4 +--
> arch/arm64/include/asm/spinlock.h | 4 +--
> arch/mips/mm/gup.c | 2 +-
> arch/s390/kvm/gaccess.c | 14 ++++----
> arch/x86/include/asm/spinlock.h | 8 ++---
> arch/x86/mm/gup.c | 2 +-
> include/linux/compiler.h | 74 ++++++++++++++++++++++++++++++++++++++-
> mm/gup.c | 2 +-
> mm/memory.c | 2 +-
> mm/rmap.c | 3 +-
> 10 files changed, 95 insertions(+), 20 deletions(-)
>

FWIW, the code is on

git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git linux-next
and
git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git ACCESS_ONCE

I rebased the queue against rc3 + the initial patch that triggered the whole discussion.

Christian

2014-12-04 23:40:53

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCHv4 0/9] ACCESS_ONCE and non-scalar accesses

On Wed, Dec 3, 2014 at 2:30 PM, Christian Borntraeger
<[email protected]> wrote:
>
> Linus, ok for the next merge window?

Yup, sounds like a plan to me.

Linus

2014-12-05 02:18:59

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

> +#define READ_ONCE(p) \
> + typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
> +
> +#define ASSIGN_ONCE(val, p) \
> + ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })

Minor style nit: is it necessary to name a non-pointer variable "p"?
I expect typeof(p) to be a pointer type.

(The other fun style question, which is a lot less minor, is whether
ASSIGN_ONCE should be (src,dst) as above, or (dst,src) like = and
<string.h>.)

2014-12-05 09:19:48

by Christian Borntraeger

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

Am 05.12.2014 um 03:12 schrieb George Spelvin:
>> +#define READ_ONCE(p) \
>> + typeof(p) __val; __read_once_size(&p, &__val, sizeof(__val)); __val; })
>> +
>> +#define ASSIGN_ONCE(val, p) \
>> + ({ typeof(p) __val; __val = val; __assign_once_size(&p, &__val, sizeof(__val)); __val; })
>
> Minor style nit: is it necessary to name a non-pointer variable "p"?
> I expect typeof(p) to be a pointer type.

v might be better.
>
> (The other fun style question, which is a lot less minor, is whether
> ASSIGN_ONCE should be (src,dst) as above, or (dst,src) like = and
> <string.h>.)

I tend to prefer dst, src, but Linus used src, dst in his proposal - so I used that.

2014-12-05 16:00:26

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

>> Minor style nit: is it necessary to name a non-pointer variable "p"?
>> I expect typeof(p) to be a pointer type.

> v might be better.

The current ACCESS_ONCE uses x. I also considered "var" and "mem".

>> (The other fun style question, which is a lot less minor, is whether
>> ASSIGN_ONCE should be (src,dst) as above, or (dst,src) like = and
>> <string.h>.)

> I tend to prefer dst, src, but Linus used src, dst in his proposal -
> so I used that.

The question is, does Linus actually care (hey, Linus, do you?), or was
that just a thoughtless part of a discussion about semantics?

Because if you prefer it too, there are arguments...
There are plenty of "store" operations with (src, dst) arguments,
because the focus is on the value being stored, so it comes first.

But the name "assign" almost always refers to the ":=" operation,
with the focus more on the destination.

(Now you have me thinking about German grammar and how the destination
can be identified by the dative "dem". But even though German depends
on word order less than English, normally the dative comes first.)

2014-12-05 21:38:05

by George Spelvin

[permalink] [raw]

Subject: Re: [PATCH 1/9] kernel: Provide READ_ONCE and ASSIGN_ONCE

> Of prefer it to match the put_user model, which is (val, ptr). But as long
> as there is your safety (and the whole point of the macro is that it
> figures out the type from the pointer), I guess it doesn't matter too much
> in practice.
>
> I think my original suggestion also wrote it in lower case, since it
> actually works like a function (well, template, whatever). Again kind of
> like the user copy "functions".
>
> But I don't care *that* strongly.

That's an excellent point. But it means that Christian, you're free to
do whatever seems good to you.

I'm not rabid about it, either; I just figure that now is a good time
to think carefully about it.

Personally, after running through some different names that work
with (src,dst) order I'd try the pairs:

READ_ONCE(var) / WRITE_ONCE(value,var)
or
LOAD_ONCE(var) / STORE_ONCE(value,var)

Of the two, I think I prefer the latter, because that's what the
operations are called at the hardware level, so it's most evocative of
the actual semantics. "Assignment" is a higher-level concept.

But as I said, whatever you think best. I'd just like to advocate for
some actual thinking, because it's going to be all over the kernel and
a pain to change.